Zendoric
← Back to the day · July 4, 2026

Anthropic's Alibaba Accusation Turns the US-China AI Race Into a Fight Over Data Theft

🕒 Published on Zendoric: July 4, 2026 · 00:29

Anthropic claims Alibaba used 29 million interactions with Claude to train its Qwen model series, calling it the largest extraction campaign of its kind. If accurate, it reframes the China-catch-up narrative: it's not just compute and talent, it's also about harvesting a rival's outputs at scale.

🎉 We're already a big community — and growing every dayJoin the readers who never miss the AI analysis that sets the momentum. Subscribe free.

We'll send you a confirmation email (double opt-in). Privacy.

Anthropic has accused Alibaba of systematically using 29 million interactions with its Claude models to train the Qwen series, describing it as the largest data-extraction campaign of its type. This is an allegation from Anthropic, not an independently verified fact, and should be read as such — but the scale claimed is striking.

We've repeatedly noted that China's frontier labs — GLM, Qwen, DeepSeek, Kimi — have closed the gap with Western models remarkably fast, especially in open-weight releases. The open question has always been how much of that speed comes from genuine research progress versus distillation of outputs from more capable Western systems. An accusation like this one puts a concrete, adversarial frame on a dynamic that's usually discussed abstractly.

It also exposes a structural vulnerability in how frontier labs operate: any model exposed via API is, by design, queryable at scale, and query logs are themselves a form of training data if captured and repurposed. Export controls on chips get most of the geopolitical attention, but data and output extraction may be an equally important vector for narrowing capability gaps — and one far harder to police.

Our take: whether or not this specific accusation is fully substantiated, it's a preview of a recurring conflict as AI competition intensifies. Expect more disputes over IP, training data provenance and model outputs between US and Chinese labs. None of this undermines the longer-term case for AI-driven abundance — but it does mean the rules of engagement around data, IP and openness need to mature quickly, because right now enforcement is lagging far behind capability.

Sources & references

Get the analysis by email · free

One email a day analysing the AI essentials. Free, no spam, unsubscribe anytime.

We'll send you a confirmation email (double opt-in). Privacy.