Research

2 posts

OpenAI Scientist Yao Shunyu: O3 Release and RL’s New Paradigm – AI Enters the Second Half

・ Technology #OpenAI #Reinforcement Learning #AI #Innovations

This blog summarizes a speech by OpenAI Agent Researcher Yao Shunyu at CS 224N and Columbia University.

we delve into the transformative ideas presented by OpenAI Agent Researcher Yao Shunyu during his discussions at CS 224N and Columbia University. We Stand at the Midpoint of AI For decades, the crux of AI has revolved around developing innovative training methods and models. This trajectory has proven effective: from defeating international chess and Go champions to outperforming most on SATs and bar exams, winning gold medals at IMO (International Mathematical Olympiad) and IOI (International Olympiad in Informatics) — milestones like DeepBlue, AlphaGo, GPT-4, and the O series were born from underlying training method innovations, including search, deep RL, scaling, and reasoning.

DeepSeek-R1's Innovation

・ Research #DeepSeek #LLM

Learn about the innovative features of DeepSeek-R1.

Innovation 1: Chain of Thought Self-Evaluation DeepSeek-R1 introduces a technique called “Chain of Thought (CoT),” which allows the model to explain its reasoning step-by-step. For example, when solving a math problem, it breaks down its thought process into clear steps. If an error occurs, it can be traced back to a specific step, enabling targeted improvements. This self-reflection mechanism not only enhances the model’s logical consistency but also significantly improves accuracy in complex tasks.