It’s a super competitive industry, right? And this is showing that it’s competitive globally, not just within the US,
That crazy AI data center build-out that we’ve been talking about for the last couple of years? They don’t need to do that anymore. They can build a lot less because they can provide a lot more services at a much lower price,
It’s so hard to own a scientific breakthrough” such as an AI model advancement
The United States already has the best closed models in the world. To remain competitive, we must also support the development of a vibrant open-source ecosystem,
All those other frontier model labs — OpenAI, Anthropic, Google — are going to build far more efficient models based on what they’re learning from DeepSeek,
We are freaked out fairly, I suppose, because we thought we had global AI supremacy, when, in fact, we should be celebrating,
It’s plausible to me that they can train a model with $6m,
The model itself gives away a few details of how it works, but the costs of the main changes that they claim – that I understand – don’t ‘show up’ in the model itself so much,
The breakthrough is incredible – almost a ‘too good to be true’ style. The breakdown of costs is unclear,
It’s very much an open question whether DeepSeek’s claims can be taken at face value. The AI community will be digging into them and we’ll find out,
If they’d spend more time working on the code and reproduce the DeepSeek idea theirselves it will be better than talking on the paper,
DeepSeek made R1 by taking a base model – in this case V3 – and applying some clever methods to teach that base model to think more carefully,
GPT-4 finished training late 2022. There has been a lot of algorithmic and hardware improvements since 2022, driving down the cost of training a GPT-4 class model. A similar situation happened for GPT-2. At the time it was a serious undertaking to train, but now you can train it for $20 in 90 minutes,
These massive-scale models are a very recent phenomenon, so efficiencies are bound to be found,
It’s easy to criticize,
The constraints on China's access to chips forced the DeepSeek team to train more efficient models that could still be competitive without huge compute training costs,
AI models have consistently become cheaper to train over time - this isn't new,
DeepSeek V3's training costs, while competitive, fall within historical efficiency trends,
It's overturned the long-held assumptions that many had about the computation power, the data processing that's required to innovate,
Given the limitations of purely defensive measures, it may also ramp up domestic AI investment, strengthen alliances, and refine policies to ensure it maintains leadership without unintentionally driving more nations toward China's AI ecosystem,