top of page
  • LinkedIn
  • Bluesky
  • X

Fable 5 Is Back, But We're Still Slowing Down Defenders

Updated: 2 days ago

The export controls are lifted, but their echoes remain in guardrails that trip over defensive work and fall back to weaker models. The fact that Claude Fable 5 and Mythos 5 are available to defenders again, and Anthropic has written up how it is redeploying Fable 5 is mostly good news. We should not bench our best players mid-game while trying to secure our infrastructure against AI-enabled attacks.


But hold the victory lap. The MVPs are back, yet the team captains changed the rules so our defense must move slower than the other team.


Diagram comparing normal and Fable 5 safeguards, showing classifier boundary shift allowing fewer benign prompts

Anthropic's diagram showing more benign prompts are blocked as a "safety margin".


“Fixing Jailbreaks” Slows Defenders


When this began, I warned that “fixing” these so-called jailbreaks would only slow down defenders. That is exactly what happened. Fable 5 now falls back to Opus 4.8 for coding, debugging, and other defensive tasks that trip the new guardrails.


Anthropic admits the guardrails “come at the cost of flagging benign requests more often during routine coding and debugging tasks.” Routine coding and debugging is the daily work of defense.


How sensitive are the new guardrails? If bee flatulence trips a bioweapons filter, imagine what happens to defensive cyber work.


Now I’m curious about what crazy bioweapon can be synthesized from bee farts. The presence or absence of a denial itself is likely an info leak.


Anthropic's own research on constitutional classifiers called an earlier guardrail prototype too impractical to deploy, because it “refused too many harmless queries and cost a lot of computational resources to run”.


When defensive prompts get flagged and downgraded, the organizations that need AI’s help most are left more exposed. Most will never qualify for Mythos access, so they will be left behind.


From Own Goal to Friendly Fire


I called these export controls an own goal from the start.


The wave of self-imposed guardrails has escalated to friendly fire, and not just at Anthropic. Every US-based model is now under pressure to adopt overly sensitive guardrails to avoid being singled out for special export controls.


The casualties will span our most valuable and most vulnerable systems, already struggling to keep up with what these models can do offensively.


Adversaries Never Stopped


US export controls do nothing to slow attackers with access to open weight and foreign models. Other frontier models would find the same bugs featured in the research paper that lit this regulatory powder keg.


Chinese models have been accelerating, in part by distilling US frontier models. Cutting off Fable 5 and Mythos 5 inconvenienced them too, but it did not slow them down. They kept right at it with GPT-5.5 and other highly capable models.


Self-Limiting Models Are More Dangerous Than No Guardrails


I have watched regulators draft export controls without the technical foresight to see the harm to defenders that takes years to undo. On the U.S. Wassenaar technical experts delegation, I renegotiated controls on intrusion software that, as first written, would have required export licenses for the time-sensitive sharing at the heart of incident response and vulnerability disclosure.


In the 90s, strong encryption was labeled a munition, so browsers and web services shipped weak ciphers. Attackers loved it. Stealing data was easy pickings.

The lesson I wish we would stop relearning: over-controlling dual-use technology only weakens your own side.


Not all guardrails need to go, just the ones most likely to trip defenders and seriously drive up compute costs. Those 90s export controls sparked a wildfire of global exploitation, not better security. We need to keep this new blaze contained.


Give Me Model Liberty, or Give Me Technical Debt


Regulators gonna regulate, and hackers gonna hack. Policy needs to match the speed of the AI train without sprinting down a path that makes things worse. Just in time to celebrate 250 years of the United States, let's let model freedon ring.


The models’ capabilities are not what make them dangerous. The bugs are. The bugs do not disappear because a model’s guardrails made them harder to find. The weaker the models, the faster the technical debt pile grows.


We should not be building models with trigger-happy guardrails that shoot down defenders and burn excess compute. We should be pushing for broad defender access as fast as we can, because the adversaries are not taking any holidays. Security through obscurity is still not the answer.


Right now, we are handing China extra penalty shots while tying our goalie’s hands behind their back.


It is not too late to change the play. Equip our teams with the best tools, and rally in the field as one cyber nation, indivisible, with security and defense for all.


Katie Moussouris is the founder and CEO of Luta Security, a company that can help scale your vulnerability management to meet the AI moment. She co-authored the international standards for vulnerability disclosure and handling (ISO/IEC 29147 and 30111), founded Microsoft Vulnerability Research and Microsoft's first bug bounty program, and architected Hack the Pentagon. She served on the U.S. Wassenaar technical experts delegation and now serves on the U.S. Commerce Department's Information Systems Technical Advisory Committee (ISTAC).

Comments


bottom of page