Prompting Techniques-ReAct（推理 + 行動)

nick · 2025年07月31日06:26

ReAct（推理 + 行動）

ReAct（Reasoning + Acting）是一種讓大型語言模型（LLMs）同時產生「思考（Thought）」與「行動（Action）」的提示方法，並在行動後觀察（Observation）結果以更新推理思路。
模型交錯生成推理文字與具體可執行操作（如搜尋 API 調用），能在推理過程中直接取得外部資訊，動態修正決策路徑。

ReAct 的運作原理

推理 (Thought)：模型以自然語言思考問題解法，列出下一步需執行的操作。
行動 (Action)：模型根據思考內容主動呼叫外部工具（如維基搜尋、計算器、API 等），檢索必要資訊。
觀察 (Observation)：模型接收並整合外部工具回傳結果，納入上下文。
循環迭代：根據最新觀察再進行新一輪推理與行動，直至完成最終回答。

應用範例

問題：

Aside from the Apple Remote, what other devices can control the program Apple Remote was originally designed to interact with?

ReAct(推理+行動)路徑：

上述的圖就是 ReAct 解題時的步驟範例。
ReAct 做法是：讓模型一邊說出自己的思考過程（推理），一邊根據需要執行某些動作（像是查資料）。這樣做的好處是，模型可以邊想邊調整策略，必要時還能跟外部世界互動來補充資訊。

ReAct Prompt的例子：

Question What is the elevation range for the area that the eastern sector of the
Colorado orogeny extends into?
Thought 1 I need to search Colorado orogeny, find the area that the eastern sector
of the Colorado orogeny extends into, then find the elevation range of the
area.
Action 1 Search[Colorado orogeny]
Observation 1 The Colorado orogeny was an episode of mountain building (an orogeny) in
Colorado and surrounding areas.
Thought 2 It does not mention the eastern sector. So I need to look up eastern
sector.
Action 2 Lookup[eastern sector]
Observation 2 (Result 1 / 1) The eastern sector extends into the High Plains and is called
the Central Plains orogeny.
Thought 3 The eastern sector of Colorado orogeny extends into the High Plains. So I
need to search High Plains and find its elevation range.
Action 3 Search[High Plains]
Observation 3 High Plains refers to one of two distinct land regions
Thought 4 I need to instead search High Plains (United States).
Action 4 Search[High Plains (United States)]
Observation 4 The High Plains are a subregion of the Great Plains. From east to west, the
High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130
m).[3]
Thought 5 High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer
is 1,800 to 7,000 ft.
Action 5 Finish[1,800 to 7,000 ft]
...

LangChain ReAct 使用方法
載入資料庫：

%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv
!pip install google-search-results
 
# import libraries
import openai
import os
from langchain.llms import OpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from dotenv import load_dotenv
load_dotenv()
 
# load API keys; you will need to obtain these if you haven't yet
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["SERPER_API_KEY"] = os.getenv("SERPER_API_KEY")

配置LLM：

llm = OpenAI(model_name="text-davinci-003" ,temperature=0)
tools = load_tools(["google-serper", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

輸入：

agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")

執行如下：

> Entering new AgentExecutor chain...
 I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Olivia Wilde started dating Harry Styles after ending her years-long engagement to Jason Sudeikis — see their relationship timeline.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
 
Thought: I now know the final answer.
Final Answer: Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557.
 
> Finished chain.

輸出：

"Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557."

ReAct 的優勢與挑戰

優勢

結合外部資訊：推理時能調用工具，如搜尋引擎或 API，提升資訊準確性並避免幻覺。
推理可追蹤：交錯思考與行動的紀錄提供高透明度並易於分析調整。
表現優異：在多項語言理解與決策任務中超越傳統 Chain‑of‑Thought 方法。

挑戰

提示設計複雜：需設計少量 few-shot 示例讓模型理解交錯流程與可用 actions。
需整合工具接口：必須設計與管理外部工具的連動方式與回傳資料格式。
迴圈控制成本：需要設定迭代次數上限或終止條件，避免無限循環與成本暴增。

結語

ReAct 提示框架融合「思考 → 行動 → 觀察」交錯流程，讓 LLM 不只思考，也能做出動作並根據回應修正推理。
這種策略適用於多輪問答、動態決策任務或需實時查詢外部資料的場景，提升可靠性與可解釋性。
作為 Prompt Engineering 的進階方法之一，ReAct 是 Chain-of-Thought 的自然延伸，建構智慧型 Agent 類行為的重要基礎。

References

Prompt Engineering Guide
Yao et al., 2022
Wei et al., 2022
HotpotQA
ALFWorld
WebShop

目錄:Prompting Techniques

上一篇：Prompting Techniques - 程式輔助語言模型
下一篇：Prompting Techniques - 反思強化學習