2 Key Laboratory of Road and Traffic Engineering, Ministry of Education, Shanghai, China
Abstract
Optimizing multi-agent behaviors with Equilibrium Value Estimation (EVE) to strengthen autonomous vehicle testing. We construct a closed-loop, agent-based reinforcement learning framework in which smart test agents are trained by interacting with the environment, while their diverse interaction behaviors are meta-learned through the EVE mechanism to support AV testing.
By tuning the EVE parameter, EGPO can controllably generate multi-modal driving behaviors that expose AVs to a broad spectrum of realistic and safety-critical scenarios. Compared with log-replay methods that reproduce diverse but non-interactive trajectories, and rule-based controllers that yield interactive yet overly compliant and homogeneous behaviors, EGPO jointly achieves both interactivity and behavioral diversity in zero-shot testing.
Qualitative Results
EGPO demonstrates superior interactivity and diversity compared to baselines.
Comparison with Baselines
Diverse Interaction Scenarios
Scalling Self-play
Citation
@article{fan2026optimizing,
title={Optimizing Multi-agent Behavior for Interactive Autonomous Driving with Equilibrium Value Estimation},
author={Fan, Jialin and Ni, Ying and Zhao, Yujia and Sun, Jie and Sun, Jian},
journal={IEEE Transactions on Intelligent Transportation Systems},
year={2026}
}