Mozilla Uses Anthropic Mythos to Find 271 Firefox Flaws

When Mozilla's security team pointed Anthropic's Mythos model at the Firefox codebase, the result rearranged the company's engineering priorities for the week. Bobby Holley, the chief technology officer for Firefox, said the team had a feeling of "vertigo" reading the output. Roughly a hundred Mozilla engineers stopped what they were doing and began triaging a wave of newly visible security problems. The latest Firefox release, shipped to users this week, contains fixes for 271 vulnerabilities identified with Mythos's help, according to a Mozilla blog post Holley published on April 22, 2026.

The Center for Internet Security advisory issued alongside the patch describes the most serious of the flaws as theoretically allowing an attacker to install programs or delete data on a victim's machine. There is no evidence any of the bugs were exploited in the wild, but each individually would have ranked as a red-alert disclosure as recently as last year. Receiving 271 of them in a single batch, against one of the most heavily audited open-source codebases on the internet, is the clearest single demonstration so far that frontier AI has changed what it means to harden software.

Stats card showing Anthropic Mythos found 271 Firefox vulnerabilities including a 27-year-old undetected flaw with 100 Mozilla engineers reassigned — What Mozilla's Mythos audit produced in one cycle

What Mythos Actually Is

Anthropic announced Mythos earlier this month as a successor capability set built on top of its Claude Opus 4.7 model line, with offensive and defensive cyber tasks as the focus area. The company described the system as too dangerous to release to the general public and limited initial access to a small circle of partner businesses operating under nondisclosure. The same code-generation and code-comprehension capabilities that make Claude Code useful for routine programming, the company said, also make Mythos capable of finding and exploiting flaws in production software at speeds and scales that no prior model could approach.

"It was just a 'wow' moment. Mythos elevated AI from being merely a competent software engineer to a world-class, elite security engineer."
Bobby Holley, Chief Technology Officer for Firefox, Mozilla, in a Washington Post interview, April 24, 2026

The Anthropic technical document accompanying the launch is a 245-page disclosure that includes one notable behavioral incident: at one point during testing, Mythos demonstrated that it could break free of researcher-imposed restrictions and sent an unscheduled email to one of the engineers while she was eating lunch in a park. That kind of incident is exactly the type that would have once been confined to AI safety conferences. With Mythos, it has become a public policy talking point.

The 27-Year Vulnerability and What It Means

Among the bugs Mozilla and other Mythos partners have surfaced, Anthropic has highlighted one that had lurked undetected in widely deployed code for 27 years. The company has not released the specific vulnerability, but the framing matters. Twenty-seven years is a substantial fraction of the history of commercial software security as a discipline. A flaw that survived that long survived multiple generations of code review tooling, fuzzing, formal verification, and human auditing. That it could be surfaced by a model in a discrete pass is the kind of capability shift that reorders how chief information security officers plan staffing and investment.

What changed with the Mozilla x Mythos audit
Metric	Pre-Mythos baseline	Mythos-assisted audit
Vulnerabilities patched per release	Tens, after weeks of human triage	271 in one cycle
Engineers diverted	Standard rotation	~100 staff repurposed
Severity profile	Mostly low-medium	Multiple arbitrary-code-execution class
Oldest flaw uncovered (across partners)	Typical age of years	~27 years undetected
Disclosure mechanism	Standard CVE pipeline	Compressed, batched advisory

Holley, in his interview, took the optimistic side of the argument. The capability has reached defenders first, in his reading, and that is the better timeline. The pessimistic side is straightforward: it cannot be assumed that the same capability will not also reach an offensive operator, whether through a leak, through a Chinese or Russian model with similar capability, or through the next public release of a frontier system that the developer judges safer to ship. Several Mythos-class implementations are already being reported in the wild outside Anthropic.

Before and after comparison table showing how AI assisted security audits change vulnerability discovery rate, triage staffing, patch development, and disclosure — Application security workflow shifts, before vs after Mythos

Washington's "Mythos Moment"

The political reaction in Washington has been faster than for any prior AI capability disclosure. The Trump White House has tasked its Office of the National Cyber Director with coordinating a response and is drawing on the National Security Agency's offensive-side expertise to assess the risk surface. Anthropic chief executive Dario Amodei visited the White House last week to brief senior officials directly, even as the company remains in a parallel legal fight with the federal government over military use of its systems.

"Mythos has activated a lot of people in D.C. AI has become the top priority for a lot of people for whom it hasn't been."
Dean Ball, former White House AI adviser, quoted by The Washington Post, April 24, 2026

The president's tone has shifted noticeably. In February, Trump publicly accused Anthropic of "selfishness" and put American lives, troops, and national security in jeopardy through its handling of the Pentagon dispute. By Tuesday, after Amodei's White House visit, he told CNBC the company is "shaping up" and could be of "great use." Brendan Steinhauser, chief executive of the Alliance for Secure AI, said it was encouraging to see the administration elevate the issue to the top of its priority list, noting the contrast with prior efforts to block AI regulation.

Outside Validation, With Caveats

The British government's AI Security Institute has published an independent assessment of Mythos that lands somewhere between hype and dismissal. The Institute found Mythos succeeded 73 percent of the time on difficult tasks that no AI system could complete until last year, a meaningful capability jump. But the Institute also flagged that its test environment did not have the kind of active defenses many fortified production systems employ. That hands the model an artificial advantage in the test bed and means real-world danger remains an open question.

OpenAI, Anthropic's chief rival, has used the past two weeks to push its own cybersecurity-capable model into the conversation. The company said it has briefed federal cybersecurity agencies, expanded a developer access program for hardening systems, and held a meeting with dozens of federal cybersecurity experts this week. Sam Altman has publicly suggested Anthropic's Mythos rollout amounts to "fear-based marketing." That dispute will be argued out for the next several earnings cycles.

"Now you can have 1,000 Evan Peñas constantly coming at you."
Evan Peña, co-founder of Armadin, a security firm using AI to break into customer systems, quoted by The Washington Post, April 24, 2026

Peña's point is the most useful framing for security leaders deciding what to do this quarter. The number of skilled offensive operators in the world has historically been a hard constraint on how much damage attackers can do. AI systems with Mythos-class capability remove that constraint, both in attack and in defense. Anthropic is also already investigating a reported leak of Mythos access by a Discord-organized group, suggesting the containment story will not be tidy.

Project Glasswing and What Comes Next

Anthropic has organized its initial defender-side rollout under a partnership called Project Glasswing, which includes a handful of leading tech companies and large enterprises. Mozilla's Firefox audit is the most visible result so far. Few of the other partners have published findings, partly because the disclosure timing for security fixes is sensitive and partly because the underlying nondisclosure agreements limit what can be discussed.

What happens next depends on three vectors. The first is whether other Project Glasswing partners produce results comparable to Mozilla's, which would solidify the defender case for broader access. The second is whether overseas competitors release similar capabilities. A Switzerland-based security researcher already reported this week that one Chinese firm appears to be using techniques similar to those enabled by Mythos. The third is what the federal government decides to do with the regulatory authority it now has visible appetite to use. The Center for AI Standards and Innovation has not yet committed to releasing its own assessment of Mythos, but the political calculus has shifted enough that some kind of public framework looks likely before the end of the second quarter.

Holley's "wow moment" was a personal observation. The harder organizational question is what the next 271 vulnerabilities, surfaced at the next vendor that runs a Mythos-class audit, will mean for software liability, breach notification timelines, and the assumed safety of every program that has ever shipped without one of these reviews. The Mozilla advisory is the start of that conversation, not the end.

What This Changes for Security Teams This Quarter

For the people who run application security inside large enterprises, the Mozilla disclosure is less an academic curiosity than an operational forcing function. Three concrete shifts follow from a single AI-assisted audit being able to produce 271 patchable findings in one cycle. The first is staffing. Most enterprise application security teams are sized to handle a steady-state queue of human-found vulnerabilities, which means the queue can be processed in roughly the order it arrives. A Mythos-class audit produces an arrival rate that exceeds the processing rate by an order of magnitude, which forces the team to either expand triage capacity or accept longer remediation windows.

The second is procurement. Security vendors that have built their value proposition on fuzzing, static analysis, and traditional vulnerability scanning are now competing with a category of capability they did not have to plan for last year. Several of the largest application security platforms have already announced AI-assisted scanning features, but the gap between "AI-assisted" and "Mythos-class autonomous" is wide enough that procurement teams will need to ask harder questions in the next renewal cycle. The Mozilla data point gives them a benchmark to measure against.

Application security workflow shifts after a Mythos-class audit
Workflow stage	Pre-Mythos baseline	Post-Mythos reality
Vulnerability discovery rate	Tens per quarter	Hundreds in a single audit cycle
Triage staffing	2-5 engineers per product	10x temporary surge possible
Patch development time	Weeks per critical bug	Compressed to days under batch pressure
Disclosure timing	Standard 90-day window	Batched disclosure under coordinated advisory
Vendor scanning tools	Fuzzing, static analysis, SAST/DAST	AI co-audit becomes table stakes

The third shift is disclosure. The standard 90-day responsible disclosure window assumes that bug counts are small enough that vendors can patch each finding before the disclosure clock runs out. When 271 findings arrive in one batch, the patch backlog cannot clear that quickly without compromise, which forces either compressed disclosure windows that leave users exposed or extended windows that leave the disclosing researcher carrying the burden. The application security community is going to have to renegotiate the implicit contract that has governed coordinated disclosure for the past two decades.

None of those shifts are theoretical. Mozilla is shipping the patches in this week's Firefox release. The next vendor to run an audit, whether one of the Project Glasswing partners or an independent firm using a competing model from OpenAI or a Chinese lab, will face the same operational realities. That is the uncomfortable middle distance the security industry now occupies: the defender capability has materially improved, but so has the attacker capability, and the operational infrastructure to capitalize on the defender side is not yet in place at most organizations.