Rankings

LLM rankings

1051 models from 37 providers across 8 categories. One clear metric per list.

Models1051
Providers37

Categories

Each leaderboard uses a single scoring rule—open one to see the full table.

How rankings work

One metric per list, no subjective blending.

Every category ranks on a single verifiable spec — context window size, price per token, cost per context token, or declared capabilities like vision and tool use. Only active models are included, so you never compare against retired IDs.

Context windows have grown from tens of thousands to millions of tokens. That helps with big documents and codebases, but long-context quality and pricing still vary widely. Use these lists to match window size and cost to what you actually send on each call.

FAQ

Common questions about how these lists are built.

We rank on verifiable specs: context size, pricing, max output, and declared capabilities. We do not blend in subjective benchmark scores—they vary by task. Always test on your own workload.