SalesBench: The Long-Horizon Agent-to-Agent Eval
A long-horizon RL environment where a small model learns to manage an insurance sales pipeline against an LLM buyer, scored by revenue closed instead of by an LLM judge. The trained model vastly outperforms the untrained base, and the gap widens as the eval gets harder.
10 min read