Safety Theater Turns into a Hostage Situation In a Week

Anthropic’s responsible-development framework — the brand differentiator worth a $965 billion valuation — lasted days against a friendly adversary. Its safety guardrails failed. Its government evaluators failed. And then the White House, having been notified of the failure by the company’s own largest investor, used that failure to place the models under a political ransom demand that the company accept public blame.

It is true that the security concerns were real. Mythos 5, Anthropic’s most powerful model, can carry out potent cyberattacks — a capability that has worried White House and private-sector leaders for months. Fable 5, its general-purpose variant designed to give the public a constrained version of that capability, was released nine days ago with guardrails Anthropic presented as the responsible way to bridge what Mythos can do and what the public should be permitted to do with it. The guardrails were the safety framework. They were the product pitch. They were what made releasing a model built on a cyberattack engine to the general public defensible.

Amazon researchers found a way around them in days.

Not months. Not after a sustained campaign by a state adversary with dedicated resources and institutional patience. Days — against the company’s own business partner, on infrastructure Amazon itself hosts. Anthropic itself said the vulnerabilities “seemed relatively simple.” The responsible-development framework that distinguishes Anthropic from every other company racing to deploy increasingly powerful models failed its first contact with a friendly adversary.

What followed matters more than the bypass. Andy Jassy, Amazon’s CEO, called administration officials including Treasury Secretary Scott Bessent to flag the issue. Matt Garman, who runs AWS, entered discussions with government officials. The Trump administration restricted foreign access to Fable 5 and Mythos 5 on Friday. Anthropic pulled access for all users to comply and did the same for both models. By Monday, negotiations were underway between senior Anthropic technical staff — Nicholas Carlini, a security researcher; Logan Graham, who leads risk evaluation; Dave Orr, head of safeguards — and officials from the Commerce Department and National Cyber Director Sean Cairncross’s office.

The structure of those negotiations is the story — and it is two stories at once.

The first story is a chokepoint. Amazon is Anthropic’s biggest investor. Amazon supplies the chips Anthropic needs to run its models. Amazon’s cloud-computing arm, AWS, hosts the infrastructure. Amazon’s researchers found the bypass. Amazon’s CEO reported it to the Treasury Secretary. And Amazon is now at the table in discussions about how to restore access to a product whose safety claims Amazon’s own researchers demolished. This is not a conflict-of-interest problem to be managed with disclosure. This is the structure Cory Doctorow and Rebecca Giblin describe in Chokepoint Capitalism — an intermediary positions itself in the supply chain and extracts leverage from every transaction that passes through. Amazon sits between Anthropic and its compute, between Anthropic and the government, and — after last week — between Anthropic and the credibility of its own safety claims. Amazon does not need to own Anthropic to control its options. It needs only to be the place where the chips originate, the cloud runs, and the administration takes phone calls.

The second story is a hostage-taking. Some administration officials have said that a resolution should include “an acknowledgment on Anthropic’s part that its rollout of Fable and communication with the White House could have been improved,” according to people familiar with the discussions. The demand is not a technical fix. It is a demand for a public act of contrition, a statement that the company was wrong and the government was right to step in. The models are offline until the company says so. The government has the power to keep them there — and it is using that power not to mandate a specific security enhancement but to extract a public kneeling.

The sequence is worth laying out plainly. On Friday, after the administration restricted foreign access to Fable 5, Anthropic shut down the model for all users, and did the same for the more powerful Mythos 5, to comply with the new rule. The shutdown was a direct consequence of the government’s action. The administration had earlier feuded with Anthropic over acceptable military use of AI, and the company had been one of the few tech firms to publicly challenge the administration’s AI policy. Now, with the models offline and the company’s valuation hanging in the balance ahead of an initial public offering, the administration is using the shutdown as leverage to extract a statement of compliance. The company’s own security researchers are now in the room, trying to negotiate the technical details, but the technical details are not what the administration is after.

These two stories are not in tension. They are the same story seen from different angles, and they converge on a single fact: the parties negotiating the fix are Anthropic, whose guardrails failed; Amazon, whose researchers exposed the failure; and Commerce, whose evaluation unit missed it. What is absent from the table is any party without a financial interest in the models going back online. No independent security researchers. No civil-society organizations whose interest might run in the direction of saying the guardrails failed once and the fix should not be a negotiation among the parties who built the thing that failed. Every party at the table shares an interest in one outcome. The guardrails will be reinforced, or they will be declared reinforced, and those two things are not the same.

The engineering substance underneath the negotiation is worth stating plainly. Fable 5 is a general-purpose version of a model capable of carrying out potent cyberattacks. The guardrails were supposed to constrain that capability — to let the public use the general-purpose features while preventing the dangerous ones from being accessible. In cryptographic protocol verification, the discipline of proving that a system does what its designer claims and not what some adversary wishes it does, a security claim without a formal proof is just a claim. Anthropic asserted its guardrails held. It did not prove they held. The first party with access and a reason to look walked through them. This is the oldest story in computer security: the distance between the designer’s confidence and the attacker’s capability, which in this case was measured in days.

The Commerce Department’s own evaluation unit assessed Fable 5 before its release — and did not catch the vulnerability Amazon’s researchers found. That same Commerce unit is now at the table negotiating restoration terms. The unit that evaluated the model and declared it safe is now in the room helping decide what to do about the fact that it was wrong. This is not a regulator that failed and then opened an investigation. This is a regulator that failed and then joined the negotiations.

The cost of the shutdown is not merely the lost access for users. It is the establishment of a precedent that the government can pull a model from the market on the basis of a vulnerability that its own evaluators had already assessed, and can keep it offline until the company agrees to a public statement the government has drafted. The model is the hostage; the ransom is the acknowledgment. The administration is not protecting critical infrastructure. It is establishing the same chokepoint Amazon had already built — and it is using Amazon’s own safety failure to do it.

Some administration officials have said a resolution should include Anthropic acknowledging that its rollout and communication with the White House could have been improved. Anthropic said Friday it had “worked with the government extensively before releasing” the model — which is precisely the point: both the company’s guardrails and the government’s evaluation passed, and both were insufficient. The question is not whether the bypass was real. It is why the same government that had already assessed the model’s security is now treating a single bypass as a reason to extract a political submission. The answer is not in the technical record. It is in the leverage the government has discovered it holds.

Cybersecurity experts urged the White House last week to reverse the restrictions, and their arguments are technically sound: the restrictions are blunt, the foreign-access ban is overbroad, and the models have legitimate uses. But the structural effect of their being right is to provide cover for exactly the outcome every party at the negotiation table already needs. The experts are correct on the merits. The effect of their correctness is to clear the path for a resolution shaped by the parties who failed — the companies that built the models, the company that invests in them, and the government that evaluated them and would now like them to say they were wrong.

Independent evaluation of AI models capable of cyberattacks cannot be conducted by the parties who build the models, the parties who invest in them, or the parties who approved them for release. The evaluation must be independent of the financial interest in the outcome, and its findings must be public. This is the standard applied to pharmaceutical trials, aircraft certification, and nuclear-plant inspections — domains where the distance between the safety claim and the counterexample is measured in bodies, not valuations. That AI governance is being negotiated instead of adjudicated is itself the finding.

Anthropic’s responsible-development framework is the brand differentiator that makes a $965 billion valuation and a forthcoming IPO plausible — what distinguishes Anthropic from competitors who make fewer safety claims. The framework just failed its first real stress test. The interval between the safety claim and the safety failure is what J.K. Galbraith, writing about the 1929 crash, called the bezzle — the period when confidence runs high and the loss has not yet been recognized. Galbraith’s insight was that the bezzle makes everyone feel richer: the company has the valuation, the investors have the returns, the government has the framework. The reckoning is deferred until it is not. It is always deferred until it is not. And when it arrives, it arrives not as a technical finding but as a negotiation among the parties who share an interest in the bezzle’s survival.