AI model comparison — quality, price and open source
The leading AI models from the US, Europe and China, compared by quality (market benchmarks), cost in USD per million tokens and open-source status.
Data as of 2026-07-01 · automated research (Artificial Analysis, LMArena, official pricing) — verify before deciding.
📊 How quality is measured — three indices
We show quality three complementary ways. Here is how each index is built before you read the charts:
① Zendoric Quality (0-100) = equal-thirds average of SWE-bench-Pro (33% · real software development, checked against the maker) + LMArena (33% · human preference, normalised Elo) + Terminal-Bench (33% · agentic terminal capability). If a model lacks one of the three, its weight is shared among those present (at least two required).
② AA Index (Artificial Analysis Intelligence Index, 0-100) = a broader composite index (reasoning, science, code, maths). It gives a second reading: depending on how you measure, the maker ranking changes.
③ Cybersecurity (0-100) = capability on expert cyber tasks (hard «unguided pass@1» protocol: vuln-research and realistic exploitation). We use a non-saturated metric (the top is around 71, not 100, leaving headroom), not the Cybench «pass@k» the frontier already saturates. Sources: UK AISI, NIST-CAISI, CVE-Bench. We frame it as capability and risk, not an offensive ranking; where there is no direct eval it is estimated «est.».
📈 Zendoric Quality over time (frontier makers)
Quality index (0-100) of the top makers (their best model), last 24 months. Dashed line = quality estimated from the AA Index (labs without SWE-bench-Pro). Updated daily.
📈 AA Index over time (frontier makers)
AA Index (Artificial Analysis Intelligence Index, 0-100) of the top makers (their best model), last 24 months. It is a broader composite index (reasoning, science, code, maths) than ours. The historical series is reconstructed by anchoring each maker's trajectory to its current AA. Updated daily.
🛡️ Cybersecurity over time (frontier makers)
Cybersecurity index (0-100) of each maker's best model, last 24 months. Metric: EXPERT cyber tasks under a hard «unguided pass@1» protocol (no hints, one attempt; vuln-research and realistic exploitation). We pick it because it is NOT saturated — the top is around 71, not 100, so it discriminates and shows headroom (we drop Cybench «pass@k», where the frontier already scores ~100%). Sources: UK AISI (GPT-5.5 71.4% vs Anthropic preview 68.6%), NIST-CAISI, CVE-Bench. High confidence only for OpenAI/Anthropic (measured by AISI); the rest imputed by proximity → the whole series is marked «est.». We frame it as capability and RISK to govern, not an offensive ranking. Updated daily.
💰 Zendoric Quality vs cost
Flagship models of the top makers by quality (a maker may have several, e.g. Anthropic: Opus 4.8 and Fable 5). HIGHER = more quality; LEFT = cheaper (log axis). Hollow dot = quality estimated (AA Index). Colour by maker.
💰 AA Index vs cost
Same format as the quality/cost chart, but the vertical axis is the AA Index. HIGHER = more capability; LEFT = cheaper (log axis). Hollow dot = estimated AA (Terminal-Bench/SWE-Pro). Colour by maker.
🏆 Zendoric Quality (SW dev + arena + agentic)
| Model | Zendoric Quality | SWE-bench-Pro | LMArena | Terminal-Bench | LiveCodeBench | GPQA | ARC-AGI-2 |
|---|---|---|---|---|---|---|---|
| 🇺🇸 Claude Fable 5Anthropic · USA | 90.1 | 80.3 | 1515 | — | 89.78 | 92.6 | — |
| 🇺🇸 GPT-5.6 Sol (preview)OpenAI · USA | 78.9 | 63.0 | 1470 | 88.8 | — | 87 | — |
| 🇨🇳 GLM-5.2Zhipu AI · China | 76.9 | 62.1 | 1475 | 81.0 | 82.8 | 78 | 7 |
| 🇺🇸 Claude Opus 4.8Anthropic · USA | 76.5 | 69.2 | 1455 | 82.7 | 88.8 | 84 | 14 |
| 🇺🇸 GPT-5.5OpenAI · USA | 76.3 | 58.6 | 1475 | 82.7 | — | 85 | 16 |
| 🇨🇳 Qwen3.7-MaxAlibaba · China | 72.6 | 60.6 | 1475 | 69.7 | 91.6 | 81 | 7 |
| 🇺🇸 Claude Sonnet 5Anthropic · USA | 71.8 | 63.2 | — | 80.4 | — | 83 | 12 |
| 🇨🇳 Kimi K2.6Moonshot AI · China | 68.4 | 58.6 | 1460 | 66.7 | 89.6 | 78 | 9 |
| 🇨🇳 DeepSeek V4-ProDeepSeek · China | 66.1 | 55.4 | 1450 | 67.9 | 93.5 | 82 | 9 |
| 🇺🇸 Gemini 3 ProGoogle · USA | 65.8 | 43.3 | 1501 | 54.2 | — | 84 | 15 |
| 🇺🇸 Claude Sonnet 4.6Anthropic · USA | 62.0 | — | 1430 | 59.1 | — | 80 | 9 |
| 🇺🇸 MAI-1-previewMicrosoft · USA | 49.4 | 52.8 | — | 46.0 | 87.7 | 84.2 | — |
| 🇺🇸 Claude Mythos 5Anthropic · USA | — | 80.0 | — | — | — | — | — |
| 🇺🇸 Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta · USA | — | — | 1370 | — | 43.4 | 70 | 5 |
| 🇺🇸 Grok 4.3xAI · USA | — | — | 1496 | — | 79.4 | 84 | 16 |
| 🇪🇺 Mistral Large 3Mistral AI · Europa | — | — | 1418 | — | 34.4 | 72 | 6 |
| 🇪🇺 Magistral Medium 1.2Mistral AI · Europa | — | — | — | — | 75.0 | 76.26 | 4 |
Quality = equal-thirds average of SWE-bench-Pro (SW development) + LMArena (human preference) + Terminal-Bench (agentic capability), the three with reliable sources (Zendoric Quality); if one is missing its weight is shared among those present (at least two; otherwise «—»). LiveCodeBench and GPQA are shown for reference (indicative, may be incomplete) but are NOT in the index; ARC-AGI-2 (arcprize.org) tracks AGI progress: models score VERY low → still far from AGI. %, except LMArena (Elo).
💵 Economics (USD / 1M tokens)
| Model | Input | Cache | Output |
|---|---|---|---|
| 🇺🇸 Claude Fable 5Anthropic · USA | $10.0 | $1.0 | $50.0 |
| 🇺🇸 GPT-5.6 Sol (preview)OpenAI · USA | $5.0 | $0.5 | $30.0 |
| 🇨🇳 GLM-5.2Zhipu AI · China | $0.6 | $0.26 | $2.2 |
| 🇺🇸 Claude Opus 4.8Anthropic · USA | $5.0 | $0.5 | $25.0 |
| 🇺🇸 GPT-5.5OpenAI · USA | $5.0 | $0.5 | $30.0 |
| 🇨🇳 Qwen3.7-MaxAlibaba · China | $1.2 | $0.25 | $6.0 |
| 🇺🇸 Claude Sonnet 5Anthropic · USA | until Aug 31, 2026 $2.0 from Sep 1, 2026 $3.0 | until Aug 31, 2026 $0.2 from Sep 1, 2026 $0.3 | until Aug 31, 2026 $10.0 from Sep 1, 2026 $15.0 |
| 🇨🇳 Kimi K2.6Moonshot AI · China | $0.6 | $0.16 | $2.5 |
| 🇨🇳 DeepSeek V4-ProDeepSeek · China | $0.28 | $0.03 | $0.87 |
| 🇺🇸 Gemini 3 ProGoogle · USA | $1.25 | $0.31 | $10.0 |
| 🇺🇸 Claude Sonnet 4.6Anthropic · USA | $3.0 | $0.3 | $15.0 |
| 🇺🇸 MAI-1-previewMicrosoft · USA | — | — | — |
| 🇺🇸 Claude Mythos 5Anthropic · USA | $10.0 | $1.0 | $50.0 |
| 🇺🇸 Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta · USA | $0.2 | — | $0.6 |
| 🇺🇸 Grok 4.3xAI · USA | $3.0 | $0.75 | $15.0 |
| 🇪🇺 Mistral Large 3Mistral AI · Europa | $2.0 | — | $6.0 |
| 🇪🇺 Magistral Medium 1.2Mistral AI · Europa | $0.5 | — | $1.5 |
Claude Sonnet 5: scheduled price increase (same model) — reduced pricing until Aug 31, 2026 and standard pricing from Sep 1, 2026.
🔓 Open source & type
| Model | Open source | License | Type |
|---|---|---|---|
| 🇺🇸 Claude Fable 5Anthropic · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 GPT-5.6 Sol (preview)OpenAI · USA | No | Proprietary | Proprietary (API only) |
| 🇨🇳 GLM-5.2Zhipu AI · China | Yes | MIT | Open-weight |
| 🇺🇸 Claude Opus 4.8Anthropic · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 GPT-5.5OpenAI · USA | No | Proprietary | Proprietary (API only) |
| 🇨🇳 Qwen3.7-MaxAlibaba · China | No | Proprietary | Proprietary (API only) |
| 🇺🇸 Claude Sonnet 5Anthropic · USA | No | Proprietary | Proprietary (API only) |
| 🇨🇳 Kimi K2.6Moonshot AI · China | Yes | Modified MIT | Open-weight |
| 🇨🇳 DeepSeek V4-ProDeepSeek · China | Yes | MIT | Open-weight |
| 🇺🇸 Gemini 3 ProGoogle · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 Claude Sonnet 4.6Anthropic · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 MAI-1-previewMicrosoft · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 Claude Mythos 5Anthropic · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta · USA | Yes | Llama 4 Community | Open-weight |
| 🇺🇸 Grok 4.3xAI · USA | No | Proprietary | Proprietary (API only) |
| 🇪🇺 Mistral Large 3Mistral AI · Europa | Yes | Apache-2.0 | Open-weight |
| 🇪🇺 Magistral Medium 1.2Mistral AI · Europa | Yes | Apache-2.0 | Open-weight |
🖥️ Open source you can self-host
Small/medium models you can run on your own machine (laptop/PC/Mac). Quality = Artificial Analysis Intelligence Index (0-100; output quality), the measure with best coverage of small open models (LMArena does not list sub-32B). Memory estimated at 4-bit (Q4) and 8-bit (Q8) quantization; on Apple Silicon it is UNIFIED memory (RAM=VRAM).
| Model | Quality (AA Index) | GPQA | Params | RAM Q4 | RAM Q8 | GPU | CPU / Mac | License |
|---|---|---|---|---|---|---|---|---|
| Qwen3.5-27BAlibaba | 42 | 85.5 | 27B | 17 GB | 32 GB | ≥24 GB | Limitado (mejor GPU/Mac ≥32 GB) | Apache-2.0 |
| Gemma 4 31BGoogle | 39 | 84.3 | 31B | 18 GB | 35 GB | ≥24 GB | Limitado (mejor GPU/Mac ≥32 GB) | Gemma |
| Qwen3.5-35B-A3BAlibaba | 37 | 84.2 | 35B | 21 GB | 40 GB | ≥24 GB | Limitado (mejor GPU/Mac ≥32 GB) | Apache-2.0 |
| Gemma 4 26B A4BGoogle | 31 | 82.3 | 26B | 15 GB | 29 GB | ≥16 GB | Limitado (mejor GPU/Mac ≥32 GB) | Gemma |
| NVIDIA Nemotron-Cascade-2-30B-A3BNVIDIA | 28 | 76.1 | 30B | 18 GB | 34 GB | ≥24 GB | Limitado (mejor GPU/Mac ≥32 GB) | NVIDIA Open Model |
| gpt-oss-20bOpenAI | 24 | 71.5 | 20B | 13 GB | 25 GB | ≥16 GB | Limitado (mejor GPU/Mac ≥32 GB) | Apache-2.0 |
| Gemma 4 12BGoogle | 22 | 78.8 | 12B | 8 GB | 15 GB | ≥8 GB | Sí (CPU lento · Mac 16 GB) | Gemma |
| Gemma 4 E4BGoogle | 19 | 58.6 | 4B | 6 GB | 10 GB | ≥8 GB | Sí (CPU/Mac, fluido) | Gemma |
| Gemma 4 E2BGoogle | 15 | 43.4 | 2B | 4 GB | 7 GB | ≥8 GB | Sí (CPU/Mac, fluido) | Gemma |
🗄️ Large open source (server / multi-GPU)
Powerful open models that need a server or multiple GPUs. Quality = LMArena Elo (human preference over output, source lmarena.ai), which does cover large models. For MoE, memory counts total parameters (all experts are loaded). Memory estimated at 4-bit (Q4) and 8-bit (Q8) quantization; on Apple Silicon it is UNIFIED memory (RAM=VRAM).
| Model | Quality (LMArena) | GPQA | Params | RAM Q4 | RAM Q8 | GPU | CPU / Mac | License |
|---|---|---|---|---|---|---|---|---|
| DeepSeek-V4-ProDeepSeek | 1465 | 90.1 | 1600B | 882 GB | 1762 GB | 12× 80 GB (servidor) | No (servidor GPU) | MIT |
| Kimi K2.6Moonshot AI | 1460 | 90.5 | 1100B | 552 GB | 1102 GB | 7× 80 GB (servidor) | No (servidor GPU) | Modified MIT |
| Qwen3.5-397B-A17BAlibaba | 1450 | 88.4 | 397B | 220 GB | 438 GB | 3× 80 GB (servidor) | No (servidor GPU) | Apache-2.0 |
| Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta | 1420 | 69.8 | 400B | 222 GB | 442 GB | 3× 80 GB (servidor) | No (servidor GPU) | Llama 4 Community |
| Mistral Large 3Mistral AI | 1416 | 43.9 | 675B | 373 GB | 744 GB | 5× 80 GB (servidor) | No (servidor GPU) | Apache-2.0 |
| GLM-5.2Zhipu AI | 1360 | 91.2 | 744B | 411 GB | 820 GB | 6× 80 GB (servidor) | No (servidor GPU) | MIT |
| gpt-oss-120bOpenAI | 1353 | 80.1 | 117B | 66 GB | 130 GB | ≥80 GB | No (servidor GPU) | Apache-2.0 |