The new chatbot from DeepSeek has made a significant impact in the AI market, introducing itself with the intriguing statement:
Hi, I was created so you can ask anything and get an answer that might even surprise you.
This AI model has not only become a formidable competitor but also contributed to one of NVIDIA's largest stock price drops. DeepSeek's innovative approach to AI development sets it apart, utilizing advanced technologies such as:
Multi-token Prediction (MTP): This method allows the model to predict multiple words at once by analyzing different sentence parts, enhancing both accuracy and efficiency.
Mixture of Experts (MoE): Employing 256 neural networks, with eight activated for each token processing task, this architecture speeds up AI training and boosts performance.
Multi-head Latent Attention (MLA): By focusing on the most significant parts of a sentence and extracting key details repeatedly, MLA reduces the chance of missing crucial information, enabling the AI to capture important nuances.
Image: ensigame.com
DeepSeek, a prominent Chinese startup, claims to have developed a competitive AI model at a minimal cost, stating they spent only $6 million on training DeepSeek V3 using just 2048 graphics processors. However, analysts from SemiAnalysis have revealed that DeepSeek operates a vast computational infrastructure, utilizing approximately 50,000 Nvidia Hopper GPUs, including 10,000 H800 units, 10,000 H100s, and additional H20 GPUs. These resources are spread across multiple data centers and used for AI training, research, and financial modeling.
The company's total investment in servers is around $1.6 billion, with operational expenses estimated at $944 million. DeepSeek is a subsidiary of the Chinese hedge fund High-Flyer, which established it as a separate AI-focused division in 2023. Unlike most startups that rely on cloud computing, DeepSeek owns its data centers, allowing full control over AI model optimization and faster innovation implementation. The company remains self-funded, enhancing its flexibility and decision-making speed.
Image: ensigame.com
DeepSeek also attracts top talent, with some researchers earning over $1.3 million annually, primarily from leading Chinese universities. Despite this, the claim of training DeepSeek V3 for just $6 million seems unrealistic, as this figure only accounts for GPU usage during pre-training and excludes research, model refinement, data processing, and infrastructure costs.
Since its start, DeepSeek has invested over $500 million in AI development. Its compact structure allows for active and effective implementation of AI innovations, unlike larger, more bureaucratic companies.
Image: ensigame.com
DeepSeek's example shows that a well-funded, independent AI company can compete with industry leaders. However, experts note that the company's success is due to significant investments, technical breakthroughs, and a strong team, rather than a "revolutionary budget" for AI model development.
Image: ensigame.com
Despite this, DeepSeek's costs remain lower than those of its competitors. For instance, while DeepSeek spent $5 million on R1, the training cost for ChatGPT4o was $100 million. However, it's still cheaper than its competitors.