Stanford AI Index 2026: Models Now Clear 50% on Humanity's Last Exam

Stanford’s annual AI Index dropped today, and the headline number is hard to ignore: the best AI models now answer more than half the questions on Humanity’s Last Exam, a benchmark designed to stump even expert humans. A year ago, that number sat at just 8.8%.

What Is the AI Index?

The Stanford AI Index is the most cited annual snapshot of where artificial intelligence stands across research, industry, and society. Released each April, it pulls together data from hundreds of sources to track AI’s progress across benchmarks, economic impact, policy, and public perception.

Humanity’s Last Exam is one of the toughest tests in AI evaluation. It was created specifically because existing benchmarks had become too easy for frontier models. It asks highly specialized questions across math, science, law, and medicine that require graduate-level reasoning to answer correctly.

Who’s Leading and What the Numbers Show

This year’s report names Anthropic as the top performer. Claude Opus 4.6 leads the rankings on the Humanity’s Last Exam benchmark, with Google’s Gemini 3.1 Pro close behind. xAI and OpenAI round out the top tier. The jump from under 9% to over 50% in a single year reflects a combination of larger training runs, improved reasoning techniques, and better post-training alignment work.

The report also tracks cost. Inference prices have dropped sharply across the board, meaning more capable models are now cheaper to run than models from two years ago that were far less capable. That combination of improved performance and lower cost is accelerating real-world deployment.

The Gap Between Experts and the Public

One of the more striking findings is the split in sentiment. Among AI researchers and practitioners, 56% say they are optimistic about where the technology is heading. Among the general American public, only 10% describe themselves as excited about AI.

That gap matters for policy and adoption. Companies building AI products increasingly face a public that is skeptical or anxious, even as the underlying technology continues to improve faster than most forecasts predicted. Bridging that perception gap will be as important as the next benchmark milestone.

Why This Report Matters Right Now

The 2026 AI Index arrives at a moment when governments, investors, and enterprises are all making large bets on AI’s trajectory. The data gives those decisions a more grounded foundation than hype cycles alone. For anyone tracking the space, today’s release is the closest thing the industry has to a reliable annual checkpoint.

Whether the next year brings another jump from 50% to 80% on Humanity’s Last Exam, or whether progress slows as the remaining questions prove harder to crack, the Stanford team will be measuring it.

Stanford AI Index 2026: Models Now Clear 50% on Humanity’s Last Exam

What Is the AI Index?

Who’s Leading and What the Numbers Show

The Gap Between Experts and the Public

Why This Report Matters Right Now

Leave a ReplyCancel reply