Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。Reinforcement

Warning: Missing argument 3 for write(), called in /www/wwwroot/biwei1.cn/incs/robot.php on line 68 and defined in /www/wwwroot/biwei1.cn/incs/data.php on line 487

site stats

Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。Reinforcement

发布时间: 2025-03-20 03:30:28

1分

数据加载中

关注推特

收听电报

2

1

0

Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。
但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。
分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。
Reinforcement
时政
( twitter.com )

3个月前由马东锡 NLP 🇸🇪 提交

Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。

但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。

分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。

Reinforcement Learning: An Overview：
https://t.co/rjYSpOtbJl

点击图片查看原图

Markdown支持

评论加载中...

您可能感兴趣的：更多

1

2

1

1

🍓Marco-o1! Newly Open-Sourced o1: Towards Large Reasoning Models for Open-Ended Solutions.
🎯 Built and released a CoT dataset to activate LLMs' reasoning abilities.
💡 Integrated LLMs with MCTS to expand the solution space.
🔬 Exploited action granularities in MCTS and
时政
( twitter.com)

4个月前 • Longyue Wang • -- 点击 0 评论

2

2

1

1

Btw, the chain of thought in the "thinking" mode for Grok 3 is completely open. No summarizers or obfuscation. This is really important and the reasoning process is often fascinating!
btc
( twitter.com)

4个月前 • Keiran Paster • -- 点击 0 评论

3

2

1

1

大语言模型 post-training 的变迁，从 Large Language Model (LLM) 到 Large Reasoning Model (LRM)
本周推荐论文：POST-TRAINING OF LARGE LANGUAGE MODELS
Post-training，本质是在做一件事，即如何运用 LLM 的 pretrained knowledge 来解决实际任务，具体的方法如 supervised
时政
( twitter.com)

3个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

4

2

1

1

Grok 3 might be the best base LLM for real-world physics!
Prompt: "write a python script of a ball bouncing inside a spinning tesseract".
There is no "thinking" or "big brain" mode enabled, it's just the base model. I'm very interested in trying their reasoning models.
btc
( twitter.com)

4个月前 • Yuchen Jin • -- 点击 • 下载视频 0 评论

00:00:08

5

2

1

1

🚨 #BREAKING: Washington Post Editor-at-Large Robert Kagan has RESIGNED after owner Jeff Bezos BARRED the endorsement of Kamala Harris
The leftist media is in TOTAL crisis mode! 🤣
This comes just days after fellow leftist paper Los Angeles Times also refused to endorse a
时政
( twitter.com)

7个月前 • Nick Sortor • -- 点击 0 评论

6

2

1

1

As usual, excellent reasoning and judgment from
btc
( twitter.com)

3个月前 • Elon Musk • -- 点击 • 下载视频 0 评论

00:17:33

7

2

1

1

Reasoning from first principles is a superpower
btc
( twitter.com)

4个月前 • Elon Musk • -- 点击 • 下载视频 0 评论

00:00:45

8

2

1

1

Size difference between a large house and really large house
有趣
( twitter.com)

24天前 • NO CONTEXT HUMANS • -- 点击 0 评论

9

2

1

1

Size difference between a large house and really large house
有趣
( twitter.com)

29天前 • non aesthetic things • -- 点击 0 评论

10

2

1

1

Why O3-mini is reasoning in Chinese 🥹
时政
( twitter.com)

4个月前 • Vikhyat Rana • -- 点击 0 评论

0.05161 Second , Gzip Enable.本网所有言论均来自网络，不代表本网站立场。联系方式: [email protected]

©2012.11.21 bad.news All rights reserved. 社区自动运营第 -- 年零 -- 天
This site is protected by recha and the Google Privacy Policy and Terms of Service apply.

关注推特

').appendTo(document.body); } $(document).ready(function() { $("a.format_output").bind("click", function(e) { var tid = $(this).closest("div.entry").find("a.title").attr("vid"); if (tid) { clickCounter(tid); } }); });