TLDR - you can hit your “Sonnet only” cap in your Claude subscription and Sonnet is no longer available until the weekly caps reset – but you can keep using Opus if you have “All models” capacity remaining. ![]()
I mean it’s kind of obvious when you read the tool tip on the Sonnet limit, but doesn’t seem intuitive on the surface so it never really sank in.
My loosely considered assumption was that Sonnet was being encouraged as a daily driver and that Sonnet cap somehow allowed extra Sonnet only usage when you ran out of “All model” capacity.
I’ve hit “All model” caps a lot, but I just pay to keep using Opus (upgraded to 20x then paid overage and/or switched to Cursor and/or Windsurf until limits reset) so I never really tested it.
For interactive sessions, I just use Opus. Sonnet is very capable but I push it to its limits and it’s worth paying for Opus so I don’t have to worry about getting stuck and remembering to switch models. Or worse, getting off track and not realizing it quickly enough.
But for automated / consistent workflows, I do use Sonnet as long as it reliably succeeds in evals. Recently I was iterating on a multi-agent pipeline and burning through tons of Sonnet calls (plus all my existing agents/automations) and eventually got capped for Sonnet.
I was curious about the logic behind this but Claude hallucinated the answer since it’s not publicly disclosed (per its searches).
The best working theory is this one:
*Note - Anthropic flipped the sub-cap from Opus to Sonnet in November 2025.
The most likely explanation for the flip: when Opus had the sub-cap, it was because Opus was the most expensive model to serve per-request. But Sonnet is the model that everyone uses all day, so the aggregate demand pressure on Sonnet infrastructure is actually the harder scaling problem. Capping individual Sonnet consumption per user is likely the most effective lever they have to smooth out total load on the clusters serving Sonnet, which handle orders of magnitude more total traffic than Opus clusters.
But at the same time, I probably like that theory since it was just mirroring what I said when I asked the question:
So your original instinct was right — it’s a capacity management play on the infrastructure that takes the most aggregate beating, not necessarily the most expensive per-request model.
