Anthropic's Claude Mythos Preview: Too Dangerous to Release

Anthropic has built the most capable AI model it has ever created, and decided not to release it to the public. Claude Mythos Preview can autonomously discover and exploit security vulnerabilities in software used by billions of people, and Anthropic believes that power is too dangerous to hand over freely.

A New Kind of AI Model

For years, AI tools have assisted security researchers by suggesting code, flagging known weaknesses, or summarizing vulnerability reports. But these tools still required significant human expertise to be useful. A skilled researcher could move faster with AI assistance, but the AI itself was not driving the investigation.

Claude Mythos Preview changes that. Anthropic describes it as a fundamentally new model class, with state-of-the-art performance in cybersecurity, software engineering, and complex reasoning. The model was first exposed in late March 2026 through an accidental data leak, then officially unveiled as a preview on April 8.

What Mythos Found on Its Own

Anthropic’s red team reports that Mythos Preview has autonomously identified thousands of high-severity vulnerabilities across every major operating system and web browser. Among them: a 17-year-old remote code execution flaw in FreeBSD (CVE-2026-4747) that lets attackers gain root access on machines running NFS, a 27-year-old bug in OpenBSD, and a 16-year-old vulnerability in FFmpeg. The model did not just find these issues — it exploited them without human guidance.

That last detail is what sets Mythos apart. Identifying a vulnerability and successfully exploiting it are two very different tasks. The fact that an AI can now close that gap autonomously has alarmed even its own creators.

Project Glasswing: Offense as Defense

Rather than shelve the model, Anthropic launched Project Glasswing, a consortium designed to use Mythos Preview for defensive purposes before malicious actors can develop similar capabilities. Cisco, AWS, Apple, Google, JPMorganChase, Microsoft, and Nvidia are among the launch partners, each receiving early access to patch vulnerabilities in critical software. Anthropic is committing up to $100 million in model usage credits and $4 million in direct donations to open-source security organizations to back the effort.

Why This Matters

The decision to withhold a frontier model from public release is almost without precedent. It reflects a serious internal judgment: that some AI capabilities are simply too asymmetric to release openly, where offense is easy and defense is hard. The Cloud Security Alliance has already warned IT teams to prepare for a surge in AI-discovered vulnerabilities, regardless of whether Mythos itself reaches the public.

What happens next will set a template for how the AI industry handles models that cross this threshold. If Project Glasswing succeeds, it may prove that the companies most at risk from AI-enabled cyberattacks are also the ones best positioned to stop them. If it does not, the next major breach could be the one that forces a much harder conversation.