April 23, 2026

Too Dangerous to Release, or Just Too Broken?

Anthropic's "too dangerous to release" AI wasn't too dangerous. It was too broken to serve.

The New York Times ran Mythos like a five-alarm fire. Central banks scrambled. Anthropic said it was holding the model back because it could autonomously find and weaponize software vulnerabilities.

Then the Financial Times reported the actual reason for the limited rollout: Anthropic didn't have enough computing power to serve it reliably. They'd been having outages for weeks. The model wasn't held back because it was too powerful. It was held back because the servers couldn't handle the load.

(The Register called it a "nothingburger." IEEE Spectrum was more diplomatic: "a real but incremental advance." Neither sounds much like the end of the world.)

This isn't new. It's the playbook. And it works because business owners keep taking AI press releases at face value.

Look at what happened a few weeks ago with Opus 4.7. Anthropic's marketing said "smarter." Nate Jones (who actually stress-tested it for days instead of just reading the blog post) found something more honest: better at some hard tasks, worse at web research and terminal work, and quietly more expensive per token. His line, "The backlash and the praise are both describing real things," is the most useful thing anyone said about that launch.

Because the pattern is the same every time. Company announces a breakthrough. Press amplifies it. You feel pressure to react (upgrade, adopt, restructure). Then the independent testers show up two weeks later and say "some things got better, some got worse, and it costs more."

Treat every AI announcement like a vendor press release, because that's what it is. The "safety concerns" are marketing. The "breakthrough" is marketing. The actual changes are usually incremental and mixed.

Three things worth doing instead of reacting to the next launch cycle:

Wait two weeks. Let independent reviewers test it in production. The real picture always takes about 14 days to emerge.
Test against your own use cases, not theirs. "Smarter" means nothing if it's worse at the specific thing you need it to do. (I went deeper on this earlier this week — skip the leaderboard, run the models against your actual work.)
Watch the price, not just the features. Model updates increasingly come with hidden cost increases (more tokens per response, changed defaults that burn through your budget faster).

Next time an AI company says something is too dangerous to release, check whether their servers are up first.