ReAct(推理 + 行動)
- ReAct(Reasoning + Acting)是一種讓大型語言模型(LLMs)同時產生「思考(Thought)」與「行動(Action)」的提示方法,並在行動後觀察(Observation)結果以更新推理思路。
- 模型交錯生成推理文字與具體可執行操作(如搜尋 API 調用),能在推理過程中直接取得外部資訊,動態修正決策路徑。
目錄
ReAct 的運作原理
- 推理 (Thought):模型以自然語言思考問題解法,列出下一步需執行的操作。
- 行動 (Action):模型根據思考內容主動呼叫外部工具(如維基搜尋、計算器、API 等),檢索必要資訊。
- 觀察 (Observation):模型接收並整合外部工具回傳結果,納入上下文。
- 循環迭代:根據最新觀察再進行新一輪推理與行動,直至完成最終回答。
應用範例
問題:
Aside from the Apple Remote, what other devices can control the program Apple Remote was originally designed to interact with?
ReAct(推理+行動)路徑:
上述的圖就是 ReAct 解題時的步驟範例。
ReAct 做法是:讓模型一邊說出自己的思考過程(推理),一邊根據需要執行某些動作(像是查資料)。這樣做的好處是,模型可以邊想邊調整策略,必要時還能跟外部世界互動來補充資訊。
ReAct Prompt的例子:
Question What is the elevation range for the area that the eastern sector of the
Colorado orogeny extends into?
Thought 1 I need to search Colorado orogeny, find the area that the eastern sector
of the Colorado orogeny extends into, then find the elevation range of the
area.
Action 1 Search[Colorado orogeny]
Observation 1 The Colorado orogeny was an episode of mountain building (an orogeny) in
Colorado and surrounding areas.
Thought 2 It does not mention the eastern sector. So I need to look up eastern
sector.
Action 2 Lookup[eastern sector]
Observation 2 (Result 1 / 1) The eastern sector extends into the High Plains and is called
the Central Plains orogeny.
Thought 3 The eastern sector of Colorado orogeny extends into the High Plains. So I
need to search High Plains and find its elevation range.
Action 3 Search[High Plains]
Observation 3 High Plains refers to one of two distinct land regions
Thought 4 I need to instead search High Plains (United States).
Action 4 Search[High Plains (United States)]
Observation 4 The High Plains are a subregion of the Great Plains. From east to west, the
High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130
m).[3]
Thought 5 High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer
is 1,800 to 7,000 ft.
Action 5 Finish[1,800 to 7,000 ft]
...
LangChain ReAct 使用方法
載入資料庫:
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv
!pip install google-search-results
# import libraries
import openai
import os
from langchain.llms import OpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from dotenv import load_dotenv
load_dotenv()
# load API keys; you will need to obtain these if you haven't yet
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["SERPER_API_KEY"] = os.getenv("SERPER_API_KEY")
配置LLM:
llm = OpenAI(model_name="text-davinci-003" ,temperature=0)
tools = load_tools(["google-serper", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
輸入:
agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")
執行如下:
> Entering new AgentExecutor chain...
I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Olivia Wilde started dating Harry Styles after ending her years-long engagement to Jason Sudeikis — see their relationship timeline.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
Thought: I now know the final answer.
Final Answer: Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557.
> Finished chain.
輸出:
"Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557."
ReAct 的優勢與挑戰
優勢
- 結合外部資訊:推理時能調用工具,如搜尋引擎或 API,提升資訊準確性並避免幻覺。
- 推理可追蹤:交錯思考與行動的紀錄提供高透明度並易於分析調整。
- 表現優異:在多項語言理解與決策任務中超越傳統 Chain‑of‑Thought 方法。
挑戰
- 提示設計複雜:需設計少量 few-shot 示例讓模型理解交錯流程與可用 actions。
- 需整合工具接口:必須設計與管理外部工具的連動方式與回傳資料格式。
- 迴圈控制成本:需要設定迭代次數上限或終止條件,避免無限循環與成本暴增。
結語
- ReAct 提示框架融合「思考 → 行動 → 觀察」交錯流程,讓 LLM 不只思考,也能做出動作並根據回應修正推理。
- 這種策略適用於多輪問答、動態決策任務或需實時查詢外部資料的場景,提升可靠性與可解釋性。
- 作為 Prompt Engineering 的進階方法之一,ReAct 是 Chain-of-Thought 的自然延伸,建構智慧型 Agent 類行為的重要基礎。
References
Prompt Engineering Guide
Yao et al., 2022
Wei et al., 2022
HotpotQA
ALFWorld
WebShop
上一篇:Prompting Techniques - 程式輔助語言模型
下一篇:Prompting Techniques - 反思強化學習
