On January 21, President Donald Trump stood by OpenAI CEO Sam Altman as they unveiled Stargate, a $500 billion investment in data centers and infrastructure to power AI. Like many, I balked at the cost—but that’s what Altman insisted was necessary to scale up superintelligent artificial intelligence. Meanwhile, on the opposite side of the globe, the Chinese AI company DeepSeek had just released R1, a reasoning model that matches OpenAI’s o1 on advanced capabilities tests yet supposedly cost only $5.6 million to train.
Consumers shot DeepSeek to #1 on the App Store. US tech stocks tanked. Investors argued about whether the numbers were real. The prediction markets went wild. Trump called DeepSeek “positive” and “a wake-up call.”
I turned to Substack’s tech experts to find out what was actually going on.
How does DeepSeek R1 work?
Like ChatGPT, DeepSeek R1 is a large language model. LLMs train on massive text datasets to build a model of the world, which helps them generate human-like responses to user prompts. (Thanks to
for the low-jargon breakdown.)If you talk to DeepSeek, one of the first things you’ll notice is the “chain of thought”: the AI’s exposed internal monologue. I find it endearing;
compares it to the publishing of Hunter S. Thompson’s raw notes.Don’t bother asking DeepSeek about Tiananmen Square, though. Or Taiwan, or Xi Jinping. As
finds, this censorship happens in the cloud—local versions of R1 will get as spicy as you want. It also isn’t multimodal: unlike most of its U.S. competitors, R1 handles text but not images.For the nerds in the audience (complimentary), the DeepSeek team published a detailed technical paper.
breaks down the methods that make R1 so efficient, in both technical and lay language. Their algorithms are like “cooking a big meal in a [small] kitchen. You have to be clever with how you use your oven, stove, and fridge, making sure you're not wasting time or space.” speculates on the future of models like R1 that can reason without human help in the form of supervised fine-tuning. “Are there inhuman ways to reason about the world that are more efficient than ours? Will AIs get not only more intelligent but increasingly indecipherable to us? I believe that the answer is yes.”As for that $5.6 million number, it probably only covers the last training run (which is still impressive), explains
. Other overhead costs, like employee salaries, electricity costs, and pretraining experiments, aren’t included. And points to the history of tech commoditization: we should expect costs to drop.Who’s behind DeepSeek?
published the first English-translated interviews with DeepSeek CEO Liang Wenfeng (2023, 2024). But you might not know about DeepSeek’s parent company, the hedge fund High-Flyer Quant. shared an interview with High-Flyer CEO Lu Zhengzhe on how AI helps the company invest.This symbiotic relationship is an advantage, Liang says. High-Flyer’s ample coffers allow DeepSeek “to focus on research and exploration rather than vertical domains and applications.”
Hence the buzzing lab culture portrayed in
’s deep dive, replete with young math competition winners and minimal commercial pressure. According to , Chinese engineers open-source partly to “prove that their tech is good enough to be taken for free by foreign firms—some nationalism, some engineering pride.” reminds us not to attribute DeepSeek’s accomplishment to government policy. Unlike China’s leading battery and solar companies, DeepSeek’s funding was all private. “DeepSeek’s success arose not because of China’s innovation system but in spite of it.”What does this mean for the tech industry?
Just to repeat the fact: R1 is really, really cheap. “It performs about as well as OpenAI’s o1 reasoning model but is about a tenth the cost,” writes
in his comprehensive overview. He explains, “DeepSeek researchers managed to replicate advanced reasoning on what would typically be considered ‘lower-grade’ hardware.”DeepSeek also released R1 with open weights, meaning anyone can download, run, and tune it themselves (unlike ChatGPT, Claude, or Gemini). It’s an instant reasoning upgrade for small AI models everywhere, per Anthropic co-founder
. Or as puts it in his paean to open-source AI, “Everything is a computer with a PhD.” For instance, “What if your smoke detector could call 911, itself, when it detects a problem—while also calling you and cross-referencing what it is sensing with security camera footage in your home?”Possibilities like these are probably good for Apple, as a maker of consumer devices, but embarrassing for Meta, whose riches should have given it a lead. Despite the Monday crash, the jury’s out on Nvidia (the company that makes the most powerful AI hardware). Everyone is asking, Will the Jevons paradox hold?
As for OpenAI,
thinks it’s in trouble—Altman just announced price cuts for ChatGPT Plus. But other analysts argue that OpenAI is still in the lead: the o3 model, while not yet public, blew PhD-level math benchmarks out of the water.Chinese tech leaders aren’t claiming victory either, according to a summit translated by
. They know the competition’s fierce (and pricey) at the top: “OpenAI’s $500B computing power makes sense… Being at the frontier exploring the next generation is most resource-intensive.”What does this mean for AI policy?
Not everyone is thrilled about open-source proliferation.
, for one, worries about malicious actors adapting R1 for scams and cyberattacks. He also suggests that a U.S.-China arms race might lead to careless acceleration.For what it’s worth, that race has already started. China’s rapid progress in advanced manufacturing surprised the world and brought tech to the forefront of U.S.-China tensions, writes
. DeepSeek only stokes further anxieties about whether Silicon Valley can sustain a large lead in AI. makes the case for U.S. antitrust laws, arguing that a more competitive market spurred DeepSeek’s innovation. asks: Why let American tech companies burn $500 billion on AI infrastructure if DeepSeek is doing more with less? On the podcast, Karen Hao concurs: with R1’s efficiency gains, we shouldn’t need all those datacenters (and their resource consumption) to bring AI to the masses.But the more computing power you have, the faster you can innovate, explains
on ChinaTalk. “You wouldn’t want to choose between using [AI] for improving cyber capabilities, helping with homework, or solving cancer.” That means chip sanctions are still an effective means of maintaining a US lead. “While export controls may have some negative side effects, the overall impact has been slowing China’s ability to scale up AI.”What does this mean for you and me?
After hearing the hype, I decided to test DeepSeek myself, dumping in notes for a post I’m writing on chatbots and oral culture. My expectations were low: LLM outputs tend to carry an unmistakable robotic sheen. But for the first time ever, I watched an AI generate readable—even rhythmic and engaging—prose from my pile of fragments.
I was stunned. For a few hours, I wondered if I should put my own sweat into the essay at all. Then, as if a sign from the gods,
’s latest essay hit my inbox. AI might feel magical in its frictionlessness, but (human) writing—like walking—is valuable because of the friction.Sometimes it’s tiring or slow, sometimes it’s boring or pointless [...] When you choose to walk, you choose not to pursue immediate gratification or even comfort but simply to expand the number of things that might happen to you. Walking invests in the potentiality of your experience with almost no promise of tangible reward at all, which is something like being alive.
The scenic route! The love of the game! Of course, I remembered, recalling one of my own old posts:
That’s the human difference—not the final output, but the spark of creation. The fact that you start bad and get good later, the work of figuring out where you’re going while still on the way there. I hope we all keep writing once it stops being an insult to sound like ChatGPT. Robots have been better than humans at chess, Jeopardy!, and lifting heavy objects for a long time now, but that doesn’t supplant our interest in playing ourselves.
I closed the DeepSeek tab and returned to my notes. I think I’m still going to write that draft.
I tried out DeepSeek for some research I was doing. It still produces errors and hallucinations like all LLMs. Personally, I didn't find it performed any better than Mistral, with the exception that it accessed more recent data. The "thinking" dialog sounds a lot like an ADHD brain's "inner voice" with less repetition. It has a childlike quality to it thst's kinda cute.
One thing I found interesting is that, unlike other LLM models, when I raised a doubt over something it said it didn't just blindly accept that you're right (even if you were wrong), but instead reviewed it's sources and conclusions and in the end stuck by it.
Nice piece. Explained in a way an old man like me might learn more than 1/2 of what you conveyed.