Coding score
Difficulty-centered composite: each result counts as its deviation from that benchmark’s
observed average, so being measured on a hard benchmark doesn’t hurt and a generously-scored
one doesn’t inflate. Deviations are weighted over the full benchmark set — a missing
benchmark counts at its average, so sparse coverage dilutes toward the mean — and anchored
to the dataset average. The basis (e.g. “3/6”) shows coverage only, not the denominator.
Self-reported results count at 75% of their deviation. Independent benchmarks weigh more
than vendor-run ones:
- SWE-bench Verified ×1 mixed
- GSO ×1 independent
- Terminal-Bench 2.0 ×1.5 independent
- DeepSWE ×1.5 independent
- AA Coding Index ×1.5 independent
- CursorBench ×0.5 vendor-self-reported
Value index
coding score ÷ log₁₀(effective cost), where effective
cost is the cheapest tier whose estimated monthly quota covers your usage profile. Costs are
floored at $10 so near-free API pricing can’t dominate the index. API-only cost
assumes 80% input / 20% output tokens.
- Light 5M tokens/mo
- Daily 30M tokens/mo
- Heavy 150M tokens/mo
Vendors express quotas in incompatible units (prompts per 5h, weekly hours, credits) — each
tier carries its conversion assumption verbatim in the expanded row, and each plan a
quota-confidence label (high/medium/low) for how solid the vendor’s published numbers are.
The label never changes a score. Treat estimates as directional, not exact.
Reliability
70% curated editorial judgment per provider (incident/degradation history, quota pain, limit
transparency — rationale in the expanded row) + 30% measured behavior of the shown model,
the mean of three independent Artificial Analysis evals: non-hallucination rate
(AA-Omniscience — admitting ignorance instead of confabulating), IFBench (instruction
following) and τ²-Bench (agentic tool-calling dependability).
Every price, quota and benchmark entry links its source and carries a verification date.
Data last verified Jun 9, 2026. Prices in USD
as quoted by vendors; quarterly/annual plans converted to effective monthly.
© 2026 AICODA Legal notice Privacy policy
All scores are our evaluation and may change.