Benchmarks

AI Models

Tiered ranking of large language models optimized for agentic workflows. Updated continuously.

Tier 1Frontier

Complex reasoning · Strategy · Planning · External dev only

7 models
Anthropic
Claude Opus 4.7Anthropic · Apr 2026
Cost /1M$5 / $25 out
Context1M
SWE-Verified87.6%
SWE-Pro64.3%
MCP-Atlas77.3%
Z
GLM-5.1Z.AI · Apr 2026
Cost /1M$1.40 / $4.40 out
Context200K
SWE-Bench Pro58.4%
SWE-Verified77.8%
GPQA Diamond83.9%
Moonshot AI
Kimi K2.6Moonshot AI · Apr 2026
Cost /1M$0.95 / $4 out
Context256K
SWE-Verified80.2%
SWE-Pro58.6%
Terminal-Bench 2.066.7%
Anthropic
Claude Opus 4.6Anthropic · Feb 2026
Cost /1M$5 / $25 out
Context1M
SWE-Verified80.8%
Terminal B265.4%
ARC-AGI-268.9%
OpenAI
GPT-5.4OpenAI · Mar 2026
Cost /1M$2.50 / $15 out
Context1.05M
SWE-Pro57.7%
OSWorld75.0%
GPQA Diamond92.8%
OpenAI
GPT 5.5OpenAI · Apr 2026
Cost /1M$5.00 / $30 out
Context1M
SWE-Pro58.6%
Terminal-Bench 2.082.7%
GDPval84.9%
DeepSeek
DeepSeek V4 ProDeepSeek · Apr 2026
Cost /1M$1.74 / $3.48 out
Context1M
SWE-Verified~80.6%
MMLU-Pro~73.5
HumanEval~76.8%
Tier 2Agent Execution

Tool calls · Long task chains · Multi-step pipelines

4 models
Google
Gemini 3.1 ProGoogle · Feb 2026
Cost /1M$2 / $12 out
Context1M
ARC-AGI-277.1%
GPQA Diamond94.3%
SWE-Verified80.6%
MiniMax
MiniMax M2.7MiniMax · Mar 2026
Cost /1M$0.30 / $1.20 out
Context205K
SWE-Pro56.2%
Terminal B257.0%
Vibe-Pro55.6%
Moonshot
Kimi K2.5Moonshot · Feb 2026
Cost /1M$0.60 / $3.00 out
Context256K
HLE w/ tools50.2%
BrowseComp79.4%
SWE-Verified76.8%
DeepSeek
DeepSeek V3.2DeepSeek · Dec 2025
Cost /1M$0.27 / $0.41 out
Context164K
SWE-Verified70.0%
Aider polyglot74.2%
Tier 3Balanced

Content · Code · Research · Day-to-day tasks

5 models
Anthropic
Claude Sonnet 4.6Anthropic · Feb 2026
Cost /1M$3 / $15 out
Context1M
SWE-Verified79.9%
Computer use94.0%
AI Index52/100
OpenAI
GPT-5.4 miniOpenAI · Various
Cost /1M$0.75 / $4.50 out
Context400K
SWE-Pro54.4%
Tool call r193.4%
GSWorld72.1%
Alibaba
Qwen3.6 PlusAlibaba · Apr 2026
Cost /1MFREE NOW
Context1M
SWE-Verified78.8%
Meta
Llama 4 MaverickMeta · 2026
Cost /1M$0.19–$0.49
Context1M
MMLU85.5%
SWE-Verified~68%
Mistral
Mistral Small 4Mistral · Mar 2026
Cost /1M$0.15 / $0.60 out
Context256K
AA Intelligence Index27/100
AA LCR score0.72
MATH-500~93.6%
Open Source — Runs on Device

Local inference · Zero API cost · Full privacy

12 models
64 GB RAM — 48–64 GB RAM

Maximum quality local inference. Frontier-grade reasoning without leaving your machine.

5 models
Alibaba
Qwen3.6-27BAlibaba · 27B dense
Context256K
VRAM (Q4)20 GB VRAM at Q4
SWE-Verified80.0%
GPQA Diamond89.0%
MMLU-Pro88.5%
Alibaba
Qwen3.6-35B-A3BAlibaba · 35B (MoE, 3B active)
Context256K
VRAM (Q4)14 GB VRAM at Q4
SWE-Verified82.0%
GPQA Diamond88.5%
MMLU-Pro87.0%
Google
Gemma 4 31BGoogle · 31B dense
Context256K
VRAM (Q4)18–20 GB VRAM at Q4
MMLU-Pro85.2%
GPQA Diamond84.3%
AIME 202689.2%
Google
Gemma 4 26BGoogle · 26B (MoE, 3.8B active)
Context256K
VRAM (Q4)12 GB VRAM at Q4
MMLU-Pro78.5%
LiveCodeBench68.0%
Arena AI ELO1380
Alibaba
Qwen3.5-27BAlibaba · 27B
Context256K
VRAM (Q4)16 GB VRAM at Q4
SWE-Verified72.4%
GPQA Diamond85.8%
MMLU-Pro86.1%
32 GB RAM — 24–32 GB RAM

Mid-size models with serious reasoning power. The sweet spot for power users.

4 models
Alibaba
Qwen3.5-27BAlibaba · 27B
Context256K
VRAM (Q4)16 GB VRAM at Q4
SWE-Verified72.4%
GPQA Diamond85.8%
MMLU-Pro86.1%
Alibaba
Qwen3.5-35B-A3BAlibaba · 35B (MoE, 3B active)
Context256K
VRAM (Q4)10–12 GB VRAM at Q4
SWE-Verified74.0%
GPQA Diamond83.5%
MMLU-Pro84.8%
Google
Gemma 4 26BGoogle · 26B (MoE, 3.8B active)
Context256K
VRAM (Q4)8–12 GB VRAM at Q4
MMLU-Pro78.5%
LiveCodeBench68.0%
Arena AI ELO1380
KO
Carnice-27BKai OS · 27B
Context128K
VRAM (Q4)16 GB VRAM at Q4
16 GB RAM — 8–16 GB RAM

Lightweight models for laptops and everyday machines. Fast inference, low memory footprint.

3 models
Alibaba
Qwen3.5-9BAlibaba · 9B
Context256K
VRAM (Q4)6 GB VRAM at Q4
SWE-Verified76.2%
GPQA Diamond81.7%
MMLU-Pro82.5%
Google
Gemma 4 E4BGoogle · 4B effective
Context128K
VRAM (Q4)6 GB VRAM at Q4
MMLU-Pro72.0%
LiveCodeBench58.0%
Arena AI ELO1280
KO
Carnice-9BKai OS · 9B
Context128K
VRAM (Q4)6 GB VRAM at Q4