SalesBench: The Long-Horizon Agent-to-Agent Eval
Most agent benchmarks are either long-horizon or agent-to-agent. SalesBench is both: a small model runs an insurance sales pipeline against an LLM buyer, scored by revenue closed. A trained 2B model converts 18.5% of leads at 100 leads vs 0.3% for the untrained base, a 61.7x lift that grows as the eval gets harder.
10 min read