Zendoric

AI model comparison — quality, price and open source

The leading AI models from the US, Europe and China, compared by quality (market benchmarks), cost in USD per million tokens and open-source status.

Data as of 2026-07-01 · automated research (Artificial Analysis, LMArena, official pricing) — verify before deciding.

📊 How quality is measured — three indices

We show quality three complementary ways. Here is how each index is built before you read the charts:

① Zendoric Quality (0-100) = equal-thirds average of SWE-bench-Pro (33% · real software development, checked against the maker) + LMArena (33% · human preference, normalised Elo) + Terminal-Bench (33% · agentic terminal capability). If a model lacks one of the three, its weight is shared among those present (at least two required).

② AA Index (Artificial Analysis Intelligence Index, 0-100) = a broader composite index (reasoning, science, code, maths). It gives a second reading: depending on how you measure, the maker ranking changes.

③ Cybersecurity (0-100) = capability on expert cyber tasks (hard «unguided pass@1» protocol: vuln-research and realistic exploitation). We use a non-saturated metric (the top is around 71, not 100, leaving headroom), not the Cybench «pass@k» the frontier already saturates. Sources: UK AISI, NIST-CAISI, CVE-Bench. We frame it as capability and risk, not an offensive ranking; where there is no direct eval it is estimated «est.».

📈 Zendoric Quality over time (frontier makers)

Quality index (0-100) of the top makers (their best model), last 24 months. Dashed line = quality estimated from the AA Index (labs without SWE-bench-Pro). Updated daily.

AnthropicOpenAIxAIZhipu AIAlibabaMoonshot AIDeepSeekGoogleMicrosoftMistral AIMeta
26405467819524-0824-1125-0225-0525-0825-1126-0226-0526-07Sonnet 3.5 v2Sonnet 3.7Opus 4Opus 4.7Opus 4.8Fable 5Claude Fable 5o1o3GPT-5GPT-5.5GPT-5.6 SolGPT-5.6 Sol (preview)Grok 4Grok 4.3Grok 4.3 (est.)GLM-4-PlusGLM-4.5GLM-5GLM-5.2GLM-5.2Qwen2.5Qwen3Qwen3.5Qwen3.7-MaxQwen3.7-MaxKimi k1.5Kimi K2Kimi K2.6Kimi K2.6V3R1V3.2V4-ProDeepSeek V4-ProGemini 2.0Gemini 2.5Gemini 3 ProGemini 3 ProMAI-1MAI-1-previewLarge 3Mistral Large 3 (est.)Llama 4Llama 4 Maverick (llama-4-maverick-17b-128e-instruct) (est.)

📈 AA Index over time (frontier makers)

AA Index (Artificial Analysis Intelligence Index, 0-100) of the top makers (their best model), last 24 months. It is a broader composite index (reasoning, science, code, maths) than ours. The historical series is reconstructed by anchoring each maker's trajectory to its current AA. Updated daily.

AnthropicOpenAIxAIZhipu AIAlibabaDeepSeekMoonshot AIGoogleMicrosoftMistral AIMeta
8193142546524-0824-1125-0225-0525-0825-1126-0226-0526-07Sonnet 3.5 v2Sonnet 3.7Opus 4Opus 4.7Opus 4.8Fable 5Claude Fable 5o1o3GPT-5GPT-5.5GPT-5.6 SolGPT-5.6 Sol (preview)Grok 4Grok 4.3Grok 4.3GLM-4-PlusGLM-4.5GLM-5GLM-5.2GLM-5.2Qwen2.5Qwen3Qwen3.5Qwen3.7-MaxQwen3.7-MaxV3R1V3.2V4-ProDeepSeek V4-ProKimi k1.5Kimi K2Kimi K2.6Kimi K2.6Gemini 2.0Gemini 2.5Gemini 3 ProGemini 3 ProMAI-1MAI-1-previewLarge 3Mistral Large 3Llama 4Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)

🛡️ Cybersecurity over time (frontier makers)

Cybersecurity index (0-100) of each maker's best model, last 24 months. Metric: EXPERT cyber tasks under a hard «unguided pass@1» protocol (no hints, one attempt; vuln-research and realistic exploitation). We pick it because it is NOT saturated — the top is around 71, not 100, so it discriminates and shows headroom (we drop Cybench «pass@k», where the frontier already scores ~100%). Sources: UK AISI (GPT-5.5 71.4% vs Anthropic preview 68.6%), NIST-CAISI, CVE-Bench. High confidence only for OpenAI/Anthropic (measured by AISI); the rest imputed by proximity → the whole series is marked «est.». We frame it as capability and RISK to govern, not an offensive ranking. Updated daily.

OpenAIAnthropicZhipu AIGoogleMoonshot AIMicrosoftAlibabaxAIDeepSeekMistral AIMeta
18304253647624-0824-1125-0225-0525-0825-1126-0226-0526-07o1o3GPT-5GPT-5.5GPT-5.6 SolGPT-5.6 Sol (est.)Sonnet 3.5 v2Sonnet 3.7Opus 4Opus 4.7Opus 4.8Fable 5Claude Mythos 5 (est.)GLM-4-PlusGLM-4.5GLM-5GLM-5.2GLM-5.2 (est.)Gemini 2.0Gemini 2.5Gemini 3 ProGemini 3 Pro (est.)Kimi k1.5Kimi K2Kimi K2.6Kimi K2.6 (est.)MAI-1MAI-1 (est.)Qwen2.5Qwen3Qwen3.5Qwen3.7-MaxQwen3.7-Max (est.)Grok 4Grok 4.3Grok 4.3 (est.)V3R1V3.2V4-ProDeepSeek V4-Pro (est.)Large 3Mistral Large 3 (est.)Llama 4Llama 4 (est.)

💰 Zendoric Quality vs cost

Flagship models of the top makers by quality (a maker may have several, e.g. Anthropic: Opus 4.8 and Fable 5). HIGHER = more quality; LEFT = cheaper (log axis). Hollow dot = quality estimated (AA Index). Colour by maker.

OpenAIAnthropicGoogleDeepSeekAlibabaMoonshot AIZhipu AIxAIMistral AIMeta
Sweet spot264054678195$0.5$1$2$5$10$20$50Output cost ($/1M tokens · log scale)Claude Fable 5GPT-5.6 Sol (preview)Claude Opus 4.8GPT-5.5GLM-5.2Grok 4.3 (est.)Qwen3.7-MaxClaude Sonnet 5Kimi K2.6DeepSeek V4-ProGemini 3 ProClaude Sonnet 4.6Mistral Large 3 (est.)Llama 4 Maverick (llama-4-maverick-17b-128e-instruct) (est.)

💰 AA Index vs cost

Same format as the quality/cost chart, but the vertical axis is the AA Index. HIGHER = more capability; LEFT = cheaper (log axis). Hollow dot = estimated AA (Terminal-Bench/SWE-Pro). Colour by maker.

OpenAIAnthropicGoogleMetaxAIMistral AIDeepSeekAlibabaMoonshot AIZhipu AI
Sweet spot61830415365$0.5$1$2$5$10$20$50Output cost ($/1M tokens · log scale)AA IndexClaude Fable 5Claude Mythos 5GPT-5.6 Sol (preview)Claude Opus 4.8GPT-5.5Claude Sonnet 5Grok 4.3GLM-5.2Qwen3.7-MaxDeepSeek V4-ProKimi K2.6Gemini 3 ProClaude Sonnet 4.6Mistral Large 3Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Magistral Medium 1.2

🏆 Zendoric Quality (SW dev + arena + agentic)

ModelZendoric QualitySWE-bench-ProLMArenaTerminal-BenchLiveCodeBenchGPQAARC-AGI-2
🇺🇸 Claude Fable 5Anthropic · USA90.180.3151589.7892.6
🇺🇸 GPT-5.6 Sol (preview)OpenAI · USA78.963.0147088.887
🇨🇳 GLM-5.2Zhipu AI · China76.962.1147581.082.8787
🇺🇸 Claude Opus 4.8Anthropic · USA76.569.2145582.788.88414
🇺🇸 GPT-5.5OpenAI · USA76.358.6147582.78516
🇨🇳 Qwen3.7-MaxAlibaba · China72.660.6147569.791.6817
🇺🇸 Claude Sonnet 5Anthropic · USA71.863.280.48312
🇨🇳 Kimi K2.6Moonshot AI · China68.458.6146066.789.6789
🇨🇳 DeepSeek V4-ProDeepSeek · China66.155.4145067.993.5829
🇺🇸 Gemini 3 ProGoogle · USA65.843.3150154.28415
🇺🇸 Claude Sonnet 4.6Anthropic · USA62.0143059.1809
🇺🇸 MAI-1-previewMicrosoft · USA49.452.846.087.784.2
🇺🇸 Claude Mythos 5Anthropic · USA80.0
🇺🇸 Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta · USA137043.4705
🇺🇸 Grok 4.3xAI · USA149679.48416
🇪🇺 Mistral Large 3Mistral AI · Europa141834.4726
🇪🇺 Magistral Medium 1.2Mistral AI · Europa75.076.264

Quality = equal-thirds average of SWE-bench-Pro (SW development) + LMArena (human preference) + Terminal-Bench (agentic capability), the three with reliable sources (Zendoric Quality); if one is missing its weight is shared among those present (at least two; otherwise «—»). LiveCodeBench and GPQA are shown for reference (indicative, may be incomplete) but are NOT in the index; ARC-AGI-2 (arcprize.org) tracks AGI progress: models score VERY low → still far from AGI. %, except LMArena (Elo).

💵 Economics (USD / 1M tokens)

ModelInputCacheOutput
🇺🇸 Claude Fable 5Anthropic · USA$10.0$1.0$50.0
🇺🇸 GPT-5.6 Sol (preview)OpenAI · USA$5.0$0.5$30.0
🇨🇳 GLM-5.2Zhipu AI · China$0.6$0.26$2.2
🇺🇸 Claude Opus 4.8Anthropic · USA$5.0$0.5$25.0
🇺🇸 GPT-5.5OpenAI · USA$5.0$0.5$30.0
🇨🇳 Qwen3.7-MaxAlibaba · China$1.2$0.25$6.0
🇺🇸 Claude Sonnet 5Anthropic · USAuntil Aug 31, 2026 $2.0
from Sep 1, 2026 $3.0
until Aug 31, 2026 $0.2
from Sep 1, 2026 $0.3
until Aug 31, 2026 $10.0
from Sep 1, 2026 $15.0
🇨🇳 Kimi K2.6Moonshot AI · China$0.6$0.16$2.5
🇨🇳 DeepSeek V4-ProDeepSeek · China$0.28$0.03$0.87
🇺🇸 Gemini 3 ProGoogle · USA$1.25$0.31$10.0
🇺🇸 Claude Sonnet 4.6Anthropic · USA$3.0$0.3$15.0
🇺🇸 MAI-1-previewMicrosoft · USA
🇺🇸 Claude Mythos 5Anthropic · USA$10.0$1.0$50.0
🇺🇸 Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta · USA$0.2$0.6
🇺🇸 Grok 4.3xAI · USA$3.0$0.75$15.0
🇪🇺 Mistral Large 3Mistral AI · Europa$2.0$6.0
🇪🇺 Magistral Medium 1.2Mistral AI · Europa$0.5$1.5

Claude Sonnet 5: scheduled price increase (same model) — reduced pricing until Aug 31, 2026 and standard pricing from Sep 1, 2026.

🔓 Open source & type

ModelOpen sourceLicenseType
🇺🇸 Claude Fable 5Anthropic · USANoProprietaryProprietary (API only)
🇺🇸 GPT-5.6 Sol (preview)OpenAI · USANoProprietaryProprietary (API only)
🇨🇳 GLM-5.2Zhipu AI · ChinaYesMITOpen-weight
🇺🇸 Claude Opus 4.8Anthropic · USANoProprietaryProprietary (API only)
🇺🇸 GPT-5.5OpenAI · USANoProprietaryProprietary (API only)
🇨🇳 Qwen3.7-MaxAlibaba · ChinaNoProprietaryProprietary (API only)
🇺🇸 Claude Sonnet 5Anthropic · USANoProprietaryProprietary (API only)
🇨🇳 Kimi K2.6Moonshot AI · ChinaYesModified MITOpen-weight
🇨🇳 DeepSeek V4-ProDeepSeek · ChinaYesMITOpen-weight
🇺🇸 Gemini 3 ProGoogle · USANoProprietaryProprietary (API only)
🇺🇸 Claude Sonnet 4.6Anthropic · USANoProprietaryProprietary (API only)
🇺🇸 MAI-1-previewMicrosoft · USANoProprietaryProprietary (API only)
🇺🇸 Claude Mythos 5Anthropic · USANoProprietaryProprietary (API only)
🇺🇸 Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta · USAYesLlama 4 CommunityOpen-weight
🇺🇸 Grok 4.3xAI · USANoProprietaryProprietary (API only)
🇪🇺 Mistral Large 3Mistral AI · EuropaYesApache-2.0Open-weight
🇪🇺 Magistral Medium 1.2Mistral AI · EuropaYesApache-2.0Open-weight

🖥️ Open source you can self-host

Small/medium models you can run on your own machine (laptop/PC/Mac). Quality = Artificial Analysis Intelligence Index (0-100; output quality), the measure with best coverage of small open models (LMArena does not list sub-32B). Memory estimated at 4-bit (Q4) and 8-bit (Q8) quantization; on Apple Silicon it is UNIFIED memory (RAM=VRAM).

ModelQuality (AA Index)GPQAParamsRAM Q4RAM Q8GPUCPU / MacLicense
Qwen3.5-27BAlibaba4285.527B17 GB32 GB≥24 GBLimitado (mejor GPU/Mac ≥32 GB)Apache-2.0
Gemma 4 31BGoogle3984.331B18 GB35 GB≥24 GBLimitado (mejor GPU/Mac ≥32 GB)Gemma
Qwen3.5-35B-A3BAlibaba3784.235B21 GB40 GB≥24 GBLimitado (mejor GPU/Mac ≥32 GB)Apache-2.0
Gemma 4 26B A4BGoogle3182.326B15 GB29 GB≥16 GBLimitado (mejor GPU/Mac ≥32 GB)Gemma
NVIDIA Nemotron-Cascade-2-30B-A3BNVIDIA2876.130B18 GB34 GB≥24 GBLimitado (mejor GPU/Mac ≥32 GB)NVIDIA Open Model
gpt-oss-20bOpenAI2471.520B13 GB25 GB≥16 GBLimitado (mejor GPU/Mac ≥32 GB)Apache-2.0
Gemma 4 12BGoogle2278.812B8 GB15 GB≥8 GBSí (CPU lento · Mac 16 GB)Gemma
Gemma 4 E4BGoogle1958.64B6 GB10 GB≥8 GBSí (CPU/Mac, fluido)Gemma
Gemma 4 E2BGoogle1543.42B4 GB7 GB≥8 GBSí (CPU/Mac, fluido)Gemma

🗄️ Large open source (server / multi-GPU)

Powerful open models that need a server or multiple GPUs. Quality = LMArena Elo (human preference over output, source lmarena.ai), which does cover large models. For MoE, memory counts total parameters (all experts are loaded). Memory estimated at 4-bit (Q4) and 8-bit (Q8) quantization; on Apple Silicon it is UNIFIED memory (RAM=VRAM).

ModelQuality (LMArena)GPQAParamsRAM Q4RAM Q8GPUCPU / MacLicense
DeepSeek-V4-ProDeepSeek146590.11600B882 GB1762 GB12× 80 GB (servidor)No (servidor GPU)MIT
Kimi K2.6Moonshot AI146090.51100B552 GB1102 GB7× 80 GB (servidor)No (servidor GPU)Modified MIT
Qwen3.5-397B-A17BAlibaba145088.4397B220 GB438 GB3× 80 GB (servidor)No (servidor GPU)Apache-2.0
Llama 4 Maverick (llama-4-maverick-17b-128e-instruct)Meta142069.8400B222 GB442 GB3× 80 GB (servidor)No (servidor GPU)Llama 4 Community
Mistral Large 3Mistral AI141643.9675B373 GB744 GB5× 80 GB (servidor)No (servidor GPU)Apache-2.0
GLM-5.2Zhipu AI136091.2744B411 GB820 GB6× 80 GB (servidor)No (servidor GPU)MIT
gpt-oss-120bOpenAI135380.1117B66 GB130 GB≥80 GBNo (servidor GPU)Apache-2.0