DeepSeek V4 发布：1M 上下文 + 开源权重 + 低价 API

2025 年 1 月，DeepSeek-R1 以极低训练成本实现比肩 OpenAI o1 的推理能力，在全球 AI 社区引发震动。2026 年 4 月 24 日，DeepSeek 正式发布 V4 Preview——旗下参数规模最大、架构最具创新性、开源诚意最足的大模型系列。

发布当天，HN 上的一条讨论帖拿到 1875 分、1455 条评论，直接冲上首页。这不只是中国 AI 圈的事——全球开发者都在盯着 DeepSeek 的下一步。

两个字：开源

V4 发布的第一关键词不是"性能"，而是"开源"。

DeepSeek-V4-Pro 和 DeepSeek-V4-Flash 的权重全部在 HuggingFace 开放下载，任何人可以本地部署、推理、微调。权重采用 MIT 许可证——这是迄今为止最宽松的大模型开源协议之一。

有 HN 用户评论道：

"Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!"
— jari_mustonen, HN

关于文档质量，另一条热门评论写道：

"Seriously, why can't huge companies like OpenAI and Google produce documentation that is half this good? No BS, just a concise description of exactly what I need to write my own agent."
— throwa356262, HN

DeepSeek 在发布节奏上也值得注意——官方 API 文档先上线，媒体稿后发。这种"先干活、后吹牛"的风格，在大厂里相当罕见。

规格对比：两个模型，两种选择

V4 系列包含两款定位鲜明的模型：

DeepSeek-V4-Pro 走大杯路线：总参数 1.6T（万亿），每次推理激活 49B。以此换来的，是对标全球顶级闭源模型的综合性能——尤其在 Agentic Coding、世界知识、数学/STEM/编程推理三个维度做到了开源最强。

DeepSeek-V4-Flash 走轻量路线：总参数 284B，激活 13B。规模压缩带来的不是能力断崖，而是响应更快、价格更低。DeepSeek 官方称其推理能力接近 V4-Pro，简单的 Agent 任务甚至与 Pro 持平。

规格	V4-Pro	V4-Flash
总参数	1.6T	284B
激活参数	49B	13B
上下文	1M	1M
双模式	✅ Thinking / Non-Thinking	✅ Thinking / Non-Thinking

两款模型共享同一个核心能力：1M 上下文 + Thinking/Non-Thinking 双模式切换。

架构创新：Token-wise Compression + DSA

V4 的核心技术创新在于注意力机制。

DeepSeek 在 V3 中验证了 MLA（Multi-head Latent Attention）+ MoE（混合专家）架构的有效性。V4 在此基础上引入了 Token-wise Compression + DSA（DeepSeek Sparse Attention）：

Token-wise Compression：将 token 序列做逐token压缩，大幅降低长序列下的计算和内存开销
DSA：稀疏注意力机制，让 1M 上下文的实际成本可控

结果是：1M 上下文从 V3 的"可选高级功能"变成了 V4 的"默认标准配置"。

Agent 能力：开源 SOTA

V4 在 Agent 能力上做了专项优化：

Agentic Coding 基准测试：开源模型排名第一
SWE-bench Verified 得分 80.6%——首个突破 80% 关卡的开源权重模型
已与 Claude Code、OpenClaw、OpenCode 深度集成
已在 DeepSeek 内部实际用于 Agent 编程工作流

关于 SWE-bench 的意义，有 HN 热评指出：

"While SWE-bench Verified is not a perfect benchmark for coding, AFAIK, this is the first open-weights model that has crossed the threshold of 80% score on this by scoring 80.6%. Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so."
— primaprashant, HN

API 定价：Flash 便宜到离谱

价格项	V4-Flash	V4-Pro
1M 输入（缓存命中）	$0.028	$0.145
1M 输入（缓存未命中）	$0.14	$1.74
1M 输出	$0.28	$3.48

Flash 的定价已经在 HN 上引发了大量讨论。有人直接做了成本对比：

"Assuming it is almost as good as Opus 4.6, and assuming we are having a good enough harness, it's now more than 5x cheaper. I just want to remind you that this is happening at the same time as Anthropic A/B tests removal of Code from Pro Plan, and as OpenAI releases gpt-5.5 2x more expensive than gpt-5.4..."
— yanis_t, HN

目前 DeepSeek 已在 OpenRouter 上线，V4-Flash 价格略低于 Gemma 4 31b，但支持 Prompt Caching——对于需要长上下文的场景，实际上是最便宜的选择。

⚠️ 注意：V4-Pro 目前因高端算力资源受限，吞吐量受限，OpenRouter 延迟约 1.12s、30 tokens/s。官方表示 Ascend 950 部署后价格将大幅下降，Pro 模型还不是"完全体"。

HN 社区的冷静声音

狂热之外，也有更审慎的评估：

"Something is odd with this model, their blog posts shows REALLY good results, but in most other third-party benchmarks, people realize it's not really SOTA, even below Kimi K2.6 and GLM-5/5.1."
— XCSme, HN

核心问题是：DeepSeek 官方宣传的 SOTA 表现，部分依赖其专有的测试环境。在第三方独立评测中，V4 的表现并非在所有维度都领先。这提示了一个重要信息：大模型选型，不能只看官方 benchmark，自己测才是硬道理。

与 V3：不是升级，是跃迁

对比项	V3	V4
发布时间	2024-12-26	2026-04-24
总参数	671B	Pro: 1.6T / Flash: 284B
激活参数	37B	Pro: 49B / Flash: 13B
上下文	128K	1M（8 倍提升）
注意力	MLA + MoE	Token-wise Compression + DSA
Agent	基础能力	专项优化，SOTA

怎么用

API 调用（一行切换）：

response = client.chat.completions.create(
    model="deepseek-v4-flash",  # 或 deepseek-v4-pro
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ]
)

Thinking 模式切换：

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[...],
    thinking={"type": "enabled"},
    reasoning_effort="high"
)

本地部署： 权重已上传 HuggingFace，配合 vLLM / SGLang / LMDeploy 均可本地推理。FP8 权重已验证可行。零 CUDA 依赖——完全运行在华为芯片上，中国 AI 生态已经交付了完整技术栈。

HN 社区怎么看

V4 在 HN 上引发了大量讨论，核心议题集中在几点：

1. 开源即竞争力
DeepSeek 持续开源高性能模型，与 GPT-4.5、Claude 3.7 走向闭源高价的趋势形成鲜明对比。开发者们用"from hackers to hackers"来表达认可。

2. 定价颠覆认知
$0.14/1M 输入、$3.48/1M 输出的价格，让"大模型推理成本"的天花板被彻底击穿。有人直接质疑：如果价格能压到这么低，其他厂商的"高成本研发"叙事还成立吗？

3. 第三方验证的必要性
HN 上的独立开发者测试显示，V4 在某些第三方评测中并未达到官方宣传的 SOTA。这提示了一个重要信息：大模型选型，不能只看官方 benchmark，自己测才是硬道理。

4. 华为芯片生态的意义
DeepSeek V4 零 CUDA 依赖，完全运行在华为 Ascend 芯片上。这不只是技术新闻——它意味着中国 AI 生态已经具备了从训练到推理的完整自主可控能力。

HN 讨论链接： https://news.ycombinator.com/item?id=47884971（1875 分 · 1455 评论）

DeepSeek V4 是一句话总结：用开源模型、1M 上下文、最低价，做到接近顶级闭源模型的体验。

如果你在做 AI 应用选型，或在寻找性价比最高的推理模型，V4-Flash 值得直接上手测一轮。

参考链接：

官方公告：https://api-docs.deepseek.com/news/news260424
API 文档：https://api-docs.deepseek.com
模型定价：https://api-docs.deepseek.com/quick_start/pricing
HuggingFace 开源权重：https://huggingface.co/collections/deepseek-ai/deepseek-v4
技术报告：https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf
HN 讨论：https://news.ycombinator.com/item?id=47884971