How often are rankings updated?

Data syncs hourly from provider APIs. When pricing or models change, rankings refresh automatically.

Why is the same model in several lists?

Each category uses its own primary metric, so a model's rank can differ from one list to another.

Rankings

LLM rankings

1051 models from 37 providers across 8 categories. One clear metric per list.

Browse models Compare two

Models1051

Providers37

Largest Context Window

Models ranked by maximum context window size.

View leaderboard

Best for RAG

Models best suited for Retrieval-Augmented Generation workloads.

View leaderboard

Best for AI Agents

Models with the capabilities needed to power autonomous agent workflows.

View leaderboard

Best for Document Processing

Models with large enough context to process long documents in one pass.

View leaderboard

Best Value per Context Token

Models offering the most context window per dollar of input cost.

View leaderboard

Best Multimodal

Vision-capable models with the largest context windows.

View leaderboard

Best Reasoning

Models with extended thinking or strong reasoning capabilities.

View leaderboard

Best for Chatbots

Fast, cost-efficient models ideal for real-time conversational applications.

View leaderboard

How rankings work

One metric per list, no subjective blending.

Every category ranks on a single verifiable spec — context window size, price per token, cost per context token, or declared capabilities like vision and tool use. Only active models are included, so you never compare against retired IDs.

Context windows have grown from tens of thousands to millions of tokens. That helps with big documents and codebases, but long-context quality and pricing still vary widely. Use these lists to match window size and cost to what you actually send on each call.

Where to start

Chatbots & assistantsBest for chatbots Best for agents
Long documentsLargest context Best for documents
RAGBest for RAG
Low costCheapest per context token

FAQ

Common questions about how these lists are built.

We rank on verifiable specs: context size, pricing, max output, and declared capabilities. We do not blend in subjective benchmark scores—they vary by task. Always test on your own workload.

LLM rankings

Categories

Largest Context Window

Best for RAG

Best for AI Agents

Best for Document Processing

Best Value per Context Token

Best Multimodal

Best Reasoning

Best for Chatbots

How rankings work

FAQ