MedAgentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE

*Indicates Equal Contribution

MedAgentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

Abstract

In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature, blood pressure, ECG) and imaging results (e.g., MRI, X-ray) from a measurement agent to mimic the real-world diagnostic process. Additionally, we incorporate self-improvement mechanisms that allow models to iteratively refine their diagnostic strategies. We enhance LLM performance in our simulated setting by integrating multi-agent discussions, chain-of-thought reasoning, and experience-based knowledge retrieval, facilitating progressive learning as doctor agents interact with more patients. We also introduce an evaluation benchmark for assessing the LLM’s ability to engage in dynamic, context-aware diagnostic interactions. While MedAgentSim is fully automated, it also supports a user-controlled mode, enabling human interaction with either the doctor or patient agent. Comprehensive evaluations in various simulated diagnostic scenarios demonstrate the effectiveness of our approach.

Methodology

Simulation Architecture

Simulation architecture diagram

Overview of the MedAgentSim architecture, consisting of the conversation phase and experience replay phase.

Sequential Progression of Simulation

Sequential progression of the simulation

The sequential progression of the simulation and events at each stage.

Results and Performance

Performance Comparison

Performance Comparison Figure

Comparison of the performance of MedAgentSim with baseline models across multiple benchmarks like NEJM, MedQA, and MIMIC-IV.

Bias Reduction

Bias Reduction Figure

Impact of Cognitive and Implicit Biases on Model Accuracy. This radar plot visualizes the accuracy variations of different models under various bias conditions. Larger deviations from the center indicate greater robustness to biases, while more compact shapes suggest higher sensitivity.

Bias Insightful Plots

Bias Insightful Plot 1
Bias Insightful Plot 2

The left figure shows the initial bias distribution, while the right figure illustrates bias reduction after incorporating additional features.

BibTeX

📚 Citation
          If you use MedAgentSim in your research, please cite our paper:
          @misc{almansoori2025selfevolvingmultiagentsimulationsrealistic,
                title={Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions}, 
                author={Mohammad Almansoori and Komal Kumar and Hisham Cholakkal},
                year={2025},
                eprint={2503.22678},
                archivePrefix={arXiv},
                primaryClass={cs.CL},
                url={https://arxiv.org/abs/2503.22678}, 
          }