Green Bots versus Red Bots

Green Bots versus Red Bots: Evaluating Large Language Models for Simulating Persuasion Dynamics in Online Influence Campaigns

Filip Muntean^1,2,* Majd Al Ali^1,2,* Lucia Donatelli¹ Jurriaan van Diggelen²

^* equal contributions
¹ Vrije Universiteit Amsterdam
² TNO

Contact: filip.mihai.muntean@gmail.com , majd.alali29@gmail.com

Abstract

Large language models (LLMs) are increasingly used to simulate social interaction and persuasion dynamics, yet their validity as proxies for human cognition and behavior remains unverified. We propose a dual-level evaluation framework to assess LLM-based agents at both the individual and collective levels. At the individual level, we examine agent fidelity by comparing LLM-generated political personas to human benchmark data. We find that while agents capture broad partisan orientations, they underestimate within-group variability and reproduce stereotypical ideological biases. At the collective level, we deploy Big Five personality-differentiated agents in 1,080 structured dialogues to test the effect of rhetorical strategy on persuasive success. Our simulations reproduce theoretically expected interaction patterns; nevertheless, belief shifts are exaggerated relative to human baselines, supporting LLMs’ tendency toward over-responsiveness. These findings suggest a trade-off between engagement-optimized training objectives and psychological realism, confirming the need to use LLMs with caution to simulate human behavior. We contribute three resources: a persuasion dynamics dataset, a standardized agent taxonomy of “red” and “green” bots, and a framework for evaluating both individual-agent fidelity and emergent group-level behavior.

PDF

Published Article