来自 DeepMind Research Scientist 的点评：DeepSeek-R1论文发布当天我就研读了全文，个人认为GRPO并非其成功关键。以下才是真正重要的要素（按重要性排序）：1. 迭代式强化学习与监督微调的协同2. 混合奖励机制——针对确定性任务融合规则型RM与神经网络的RM3.

Warning: Missing argument 3 for write(), called in /www/wwwroot/biwei1.cn/incs/robot.php on line 68 and defined in /www/wwwroot/biwei1.cn/incs/data.php on line 487

site stats

来自 DeepMind Research Scientist 的点评：DeepSeek-R1论文发布当天我就研读了全文，个人认为GRPO并非其成功关键。以下才是真正重要的要素（按重要性排序）：1. 迭代式强化学习与监督微调的协同2. 混合奖励机制——针对确定性任务融合规则型RM与神经网络的RM3.

发布时间: 2025-01-28 13:30:17

1分

数据加载中

关注推特

收听电报

2

1

0

来自 DeepMind Research Scientist 的点评：
DeepSeek-R1论文发布当天我就研读了全文，个人认为GRPO并非其成功关键。以下才是真正重要的要素（按重要性排序）：
1. 迭代式强化学习与监督微调的协同
2. 混合奖励机制——针对确定性任务融合规则型RM与神经网络的RM
3.
IT技术
( twitter.com )

4个月前由宝玉提交

来自 DeepMind Research Scientist 的点评：

DeepSeek-R1论文发布当天我就研读了全文，个人认为GRPO并非其成功关键。以下才是真正重要的要素（按重要性排序）：

迭代式强化学习与监督微调的协同
混合奖励机制——针对确定性任务融合规则型RM与神经网络的RM
高质量合成数据，仅在必要时进行人工后处理
采用64次推理采样的评估体系

这些突破为计算资源有限的博士生们开辟了极具潜力的研究方向。后续我可能会在社交媒体分享基于DeepSeek-R1启发的若干研究课题。

除技术维度外，更值得称道的是：
1/ 开放精神：缺乏开放性的研究难以引发追随
2/ 卓越的学术叙事：从概念验证到展现完整潜力的复杂过程，论文构建了极具说服力的研究叙事。方法论阐述清晰易循，堪称典范。

结语：英雄之间惺惺相惜，而失败者之间则怨怼相生。让我们保持良性竞争，心怀感恩！

Markdown支持

评论加载中...

您可能感兴趣的：更多

1

2

1

1

Harvard’s AI Research Experience free course book by covers the essentials and tips on doing research:
- VSCode, Git, Conda
- PyTorch, W&B
- AWS, colab
- LLMs and VLMs
- reading AI papers
- research progress and organization
this is a must read!
时政
( twitter.com)

5个月前 • ℏεsam • -- 点击 0 评论

2

2

1

1

How to write a research proposal
时政
( twitter.com)

1年前 • Prof Lennart Nacke, PhD • -- 点击 0 评论

3

2

1

1

Meowton Cat, important research on the tipping point. 😂
有趣
( twitter.com)

5个月前 • Figen • -- 点击 • 下载视频 0 评论

00:00:39

4

2

1

1

President Trump job approval
Approve: 40%
Disapprove: 59%
(Pew Research)
时政
( twitter.com)

1个月前 • The Spectator Index • -- 点击 0 评论

5

2

1

1

Headline: DOGE is slashing medical research funding.
Reality: After you eliminate the bureaucratic and administrative skimming and kickbacks, actual medical research funding goes up.
Headline: Social Security, Medicare, and Medicaid benefits are being taken away.
Realty:
btc
( twitter.com)

2个月前 • John LeFevre • -- 点击 0 评论

6

2

1

1

Trump ends federal funding for gain-of-function research. This is a start. The next step should be to expose those who recklessly funded gain-of-function research at Wuhan, and then lied about it under oath.

时政
( twitter.com)

1个月前 • The Seeker • -- 点击 • 下载视频 0 评论

00:01:00

7

2

1

1

I will die before that
My research show that I failed
时政
( twitter.com)

2个月前 • 勃勃OC • -- 点击 0 评论

8

2

1

1

DeepMind CEO：谷歌将在人工智能上投入超过1000亿美元
大陆资讯
( finance.sina.com.cn)

1年前 • 喊妈妈 • -- 点击 0 评论

9

2

1

1

The Trump administration's efforts to slash research funding have US graduate students, postdocs and other early-career scientists fearing for their careers. Some are considering changing jobs or leaving the country. Others say them might abandon research altogether.
时政
( twitter.com)

3个月前 • nature • -- 点击 0 评论

0.09065 Second , Gzip Enable.本网所有言论均来自网络，不代表本网站立场。联系方式: [email protected]

©2012.11.21 bad.news All rights reserved. 社区自动运营第 -- 年零 -- 天
This site is protected by recha and the Google Privacy Policy and Terms of Service apply.

关注推特

').appendTo(document.body); } $(document).ready(function() { $("a.format_output").bind("click", function(e) { var tid = $(this).closest("div.entry").find("a.title").attr("vid"); if (tid) { clickCounter(tid); } }); });