The ACEBench Leaderboard presents a comprehensive evaluation of Large Language Models in tool usage. It show model performance across various scenarios, including Normal, Special, and Agent-based interactions, providing a granular view of their capabilities in effectively solving complex problems.
Model | Normal | Special | Agent | Overall | |||||
---|---|---|---|---|---|---|---|---|---|
Atom | Single-Turn | Multi-Turn | Similar API | Preference | Summary | ||||
Closed-Source Large Language Models | |||||||||
Open-Source Large Language Models |