Optimizing Multi-agent Behavior for Interactive Autonomous Driving with Equilibrium Value Estimation

Published in (Under Review) IEEE Transactions on Intelligent Transportation Systems, 2026

1 College of Transportation, Tongji University, Shanghai, China
2 Key Laboratory of Road and Traffic Engineering, Ministry of Education, Shanghai, China
EGPO Pipeline
Figure 1. The EGPO Pipeline. Our framework enables controllable generation of multi-modal driving behaviors through Equilibrium Value Estimation.

Abstract

Optimizing multi-agent behaviors with Equilibrium Value Estimation (EVE) to strengthen autonomous vehicle testing. We construct a closed-loop, agent-based reinforcement learning framework in which smart test agents are trained by interacting with the environment, while their diverse interaction behaviors are meta-learned through the EVE mechanism to support AV testing.

By tuning the EVE parameter, EGPO can controllably generate multi-modal driving behaviors that expose AVs to a broad spectrum of realistic and safety-critical scenarios. Compared with log-replay methods that reproduce diverse but non-interactive trajectories, and rule-based controllers that yield interactive yet overly compliant and homogeneous behaviors, EGPO jointly achieves both interactivity and behavioral diversity in zero-shot testing.

Qualitative Results

EGPO demonstrates superior interactivity and diversity compared to baselines.

Comparison with Baselines

Vanilla Methods
EGPO (Ours)

Diverse Interaction Scenarios

Case #1
Case #2
Case #3
Case #4

Scalling Self-play

Case #5: Urban road interchange
Case #6: Urban expressway

Citation

@article{fan2026optimizing,
  title={Optimizing Multi-agent Behavior for Interactive Autonomous Driving with Equilibrium Value Estimation},
  author={Fan, Jialin and Ni, Ying and Zhao, Yujia and Sun, Jie and Sun, Jian},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2026}
}