ACEBench: A Comprehensive Evaluation of LLM Tool Usage

The ACEBench Leaderboard presents a comprehensive evaluation of Large Language Models in tool usage. It show model performance across various scenarios, including Normal, Special, and Agent-based interactions, providing a granular view of their capabilities in effectively solving complex problems.

Last Updated: 2025-07-21
Model Normal Special Agent Overall
Atom Single-Turn Multi-Turn Similar API Preference Summary
Closed-Source Large Language Models
Open-Source Large Language Models