Skip to main content

A Brief Study in How Not to Handle a Software Vulnerability

The Statement

On 12 June 2026, Anthropic published a statement on their website. It was titled, with commendable straightforwardness —

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

Dramatic. Weighty. Very official.

And then I read it.


First, a Jailbreak

For the uninitiated — a jailbreak, in AI terms, is when someone finds a clever way to prod (or poke or nudge or push) a model past its safety guardrails. Think of it as finding the one loose fence panel the farmer missed. Every model has them. Every model has always had them. It is, to put it gently, a known occupational hazard of the entire industry.

The standard response from any developer, upon discovering one, is roughly as follows:

  1. Quietly panic for twenty minutes.
  2. Fix it. FIX IT. We are strong! 💪😤💪
  3. Push a patch!
  4. Write a suspiciously vague changelog entry such as minor security improvements.
  5. Never speak of it again.

This is the natural, time-honoured software lifecycle. Unglamorous. Undramatic. Effective.

Very effective.

The US government, however, had other ideas.


The Phone Call

On 12 June 2026, at precisely 5:21pm Eastern Time — a Friday afternoon, naturally — someone from the US government rang Anthropic up.

Not a formal written technical report. Not a detailed security briefing.

A phone call.

What followed was, and I cannot stress enough that I am not embellishing this, essentially the following:

Government person: Yeah hullo, suspend your model please.

Anthropic: ...I beg your pardon?

Government person: We've heard it can do dodgy things.

Anthropic: ...Heard from whom?

Government person: Can't say.

Anthropic: Can you send us the technical details?

Government person: We... will... tell you verbally.

Anthropic: Right. And the written directive?

Government person: Oh that we'll send. Just not the details.

Anthropic: ...GPT-5.5 does this too, you know.

Government person: Suspend it please. National security. Cheers. (Click.)

Anthropic: 👀⁉️

⬆️ I did not write that sketch. Reality did. I merely tidied the punctuation.

But this one, I made it up. Government hours, 9 to 6. So they went home usually at 6 PM. And at 5:19 PM, that one person who was ordered to call Anthropic — it must be ONE person, can't be three or seventeen of them altogether. That should be awkward. Trying to jointly deliver a verbal federal directive. Seventeen people? Simultaneously? So indeed, that one person. Here we go ⬇️

Boss: Right, quick one before you go. Ring Anthropic, suspend their model. Cheers.

That person: ...I beg your pardon?

Boss: National security. Quick sharp.

That person: ...do I send the technical...

Boss: Just ring them. Verbally. Go on.

That person: ...the codebase?

Boss: Leave it. Ring them.

That person: 👀⁉️ All right. (Dials.)

That person: (5:21 PM.) Yeah hullo, suspend your model please. (And so forth.)

That person: (Hangs up. Puts coat on. Goes home. Terrible weekend awaits.)

⬆️ Once more, I made that up. Very scholarly supposition.

Anthropic themselves confirmed it, plain as day, in their statement:

To date, the government has only given us verbal evidence of a potential narrow, non-universal jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws.

⬆️ Verbal. Evidence.

For a federal suspension directive. Affecting hundreds of millions of users globally. On a Friday. At teatime.


What the Vulnerability Actually Was

Here is the bit that truly rewards careful reading.

The jailbreak in question — the one deemed grave enough to warrant a federal suspension of a model deployed to hundreds of millions of people — essentially consisted of asking the model to read a codebase and fix any software flaws in it.

That's it.

A prompt. Asking an AI to do what AI is, largely, marketed to do.

Well, perhaps the code being analysed was like a bloody Trojan horse — as in the code being read contains something carefully crafted — instructions, patterns, exploits — embedded within the codebase itself. Well, that's... not uncommon. Can be fixed, can be patched. Let Anthropic see the analysed codebase then? Apparently not.

And Anthropic, with the restrained fury of someone very deliberately choosing their words, noted that the vulnerabilities found were relatively simple, previously known, and — here comes the knife, delivered with a smileother publicly-available models are able to discover them as well without requiring a bypass.

Other. Publicly. Available. Models.

Including, they specifically named, OpenAI's GPT-5.5.

Anthropic did not raise their voice. They did not slam a door. They simply... mentioned it. Calmly. In a public statement. For everyone to read.

Absolute ice.


The Chicken and the Kernels

There is a specific flavour of confusion that occurs when someone with tremendous authority wanders into a room they don't quite understand.

They've got the briefcase. They've got the title. They've got the federal letterhead.

They just don't know which end of the chicken accepts the kernel.

The US government, upon discovering that an AI model had a non-universal, narrow jailbreak — something the entire industry has, has always had, and has openly acknowledged will probably always exist to some degree — did not suggest a patch. Did not request a fix. Did not consult the changelog.

They issued a suspension order. For everyone. Globally. Verbally. On a Friday.

Meanwhile, somewhere in an Anthropic office, a developer sat very quietly next to a perfectly functional patch deployment pipeline, being baffled.


Anthropic's Closing Statement

To their considerable credit, Anthropic complied — and then very politely said so in the most passive-aggressively dignified way imaginable:

We are complying with the government’s legal directive and are removing access to Fable 5 and Mythos 5 for all users. However, we disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people. If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.

Translation:

Fine. FINE. We're doing it. But we want it on the record that this is completely daft.

Reluctant compliance with barely concealed contempt. Honestly, rather British of them.


In Summary

A model was suspended. Not because of a catastrophic, universal jailbreak. Not because of documented harm. Not because of a written, technically detailed government report.

Because of a phone call.

On a Friday afternoon. About something GPT-5.5 can also do. That was already known. That had a patch pipeline sitting right there.

The full announcement, which you absolutely should read yourself lest you think I've fabricated any of this theatre, is here:

https://www.anthropic.com/news/fable-mythos-access

I couldn't make this up, mate. Reality beat me to it.

Worth noting — OpenAI had Stargate and the White House. Anthropic had a patch pipeline and a Friday phone call. Coincidence? Mm.

Very Microsoft.


Consider this:

a car is pulled from the market because, at kilometre 300, the oil somehow vanishes. The mechanic is not allowed to inspect the car. No one may examine the engine. No one may see the oil. The manufacturer is simply told — verbally, over the phone, on a Friday — that the oil vanished. Ban the car. This instance. Immediately. — Trust me, the oil vanishes, ban it, cheers, goodbye, terrible weekend to you.

The anomaly may well be real. But the mechanic isn't allowed to look. So we'll never know, will we?

Well, rationally.


Anyway, to you who don't know Anthropic — What's Fable 5? What's a bloody Mythos 5? WHAT IS ANTHROPIC?

Well, hm. And yes, absolutely yes, you do need to find out what is Fable 5 IMMEDIATELY. This instance. Forthwith!

Comments

Monkey Raptor uses cookies for analytics, advertisements, and functionality. More info on Privacy Policy