skip navigation
skip mega-menu

Exploring LangChain’s ChatAgents

This blog will explore of one of LangChain’s agents: the chat agent with ReAct logic. There are a good number of LangChain agents mentioned in the LangChain documentation, however, we are just going to focus on the ReAct Agent.

So what is the ReAct Agent according to the LangChain documentation?

This agent uses the ReAct 框架,仅根据工具的描述来确定使用哪个工具. Any number of tools can be provided. This agent requires that a description is provided for each tool.

Note: This is the most general purpose action agent.

本博客还介绍了一个聊天代理的简单实现,它使用了打包在其中的3个工具 LangChain v.0.0.220. For more details about our playground agent about please see below.

Main Idea

The chat agent with ReAct logic has access to a specific list of tools and to a Large Language Model (LLM). 在收到用户生成的消息后,聊天代理会询问LLM哪个工具最适合回答这个问题. Optionally it might also send the final answer.

It might choose a tool. 如果是这种情况,则针对工具执行问题或关键字. 然后,该工具返回一个输出,然后用于LLM再次计划下一步要做什么:要么选择另一个工具,要么给出最终答案.

因此,代理使用LLM来计划在循环中下一步做什么,直到找到最终答案或放弃.

Chat Agent Flow

Here is the chat agent flow according to the LangChain implementation:

Chat Agent Flow

在web或命令行应用程序的典型场景中,流程可以分为两个部分:

  • Setup flow:用于设置代理的主要部分,包括工具和LLM.
  • Execution flow: consists of two loops. 外部循环处理用户输入,内部循环处理代理与工具和LLM的交互.

Understanding the Execution Flow

设置流程通常只是主执行流程的顺序前奏:

Chat agent execution flow

The flow executes the following steps:

  • Accept the user input. 用户输入通常是通过web、移动设备或命令行UI输入的问题.
  • The agent starts its work: it asks the LLM which tool to use to give the final answer.
  • At this stage, the first process gateway is reached. It has three outputs:
    Use tool: the LLM decided to use a specific tool. The flow continues below in “Tool replies”
    Give LLM Based answer: The LLM came up with the final answer
    Answer not understood: The LLM answer is inconclusive. The process exits here with an error.
  • Tool replies: The tool sends a message to the second gateway
  • The second's workflow gateway is reached. It has three possible outcomes:
    - normally the output of the tool is routed to the LLM. We return to the initial step of the agent loop.
    - if the executed tool is marked as a “returning tool” the response from the tool is the final answer
    -错误条件发生:如果工具抛出错误或超时发生或达到最大尝试量,则进程退出并出现错误.

A Very Simple Wikipedia, DuckDuckGo, Arxiv Agent

We have built a very simple Agent that uses three built-in LangChain agents:

  • Wikipedia (the beloved online encyclopedia)
  • Arxiv (an online archive for scientific papers)
  • DuckDuckGo (a privacy-oriented search engine)

我们使用了这三个代理,因为它们在LangChain中是开箱即用的,也不需要任何注册或付费订阅.

这个命令行应用程序可以用来谈论不同的主题,并提出以下问题:

  • Who is Donald Duck?
  • What is the weather today in London?
  • Who is Albert Einstein?
  • 谁将是下一届美国总统选举的总统候选人?
  • 2020年十大网博靠谱平台神经网络注意力层最相关的出版物是哪些?

Below is an excerpt of an interaction with the tool:

Playing around with the agent

We have tried to colour-code the outputs of the tool:

  • green: successful response
  • red: error message with some explanation
  • blue intermediate step; typically mentioning the tool and the question sent to the tool

Implementation

Our little chat application can be found in this GitHub repository:

GitHub - gilfernandes/agent_playground:一个基于LangChain代理的小演示项目.

Small demo project with a functional LangChain based agent. - GitHub - gilfernandes/agent_playground: Small demo…

github.com


The main agent code is in agent_playground.py

The agent is configured using this code:

class Config():
"""
Contains the configuration of the LLM.
"""
model = 'gpt-3.5-turbo-16k'
# model = 'gpt-4'
llm = ChatOpenAI(model=model, temperature=0)

cfg = Config()

We are using gpt-3.5 API as you can see.

The agent is setup in this function below:

def create_agent_executor(cfg: Config, action_detector_func: callable, verbose: bool = False) -> AgentExecutor:
"""
Sets up the agent with three tools: wikipedia, arxiv, duckduckgo search
:param cfg The configuration with the LLM.
:param action_detector_funcc一个更灵活的输出解析器实现, better at guessing the tool from the response.
:param verbose whether there will more output on the console or not.
"""
tools = load_tools(["wikipedia", "arxiv", "ddg-search"], llm=cfg.llm)
agent_executor: AgentExecutor = initialize_agent(
tools,
cfg.llm,
agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=verbose
)
agent = agent_executor.agent
agent.output_parser = ExtendedChatOutputParser(action_detector_func)
return agent_executor

As you can see the three tools “Wikipedia”, “arxiv”, 和“ddg-search”在这里加载,代理执行器在这里使用 CHAT_ZERO_SHOT_REACT_DESCRIPTIONtype.

You may also notice that we have added a custom output parser. The output parser is tasked with parsing the output coming from the LLM. 我们希望有一个更灵活的输出解析器实现,以便更好地检测要使用的工具——这主要是因为我们在测试期间得到了大量的错误. This is the implementation of the tool which can be found in the file: chat_output_parser.py.

This is the custom implementation of the output parser:

class ExtendedChatOutputParser(ChatOutputParser):

action_detector_func: Callable

def __init__(self, action_detector_func: Callable):
super().__init__(action_detector_func=action_detector_func)

def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
includes_answer = FINAL_ANSWER_ACTION in text

try:
action = self.action_detector_func(text)
response = json.loads(action.strip())
includes_action = "action" in response
if includes_answer and includes_action:
raise OutputParserException(
"Parsing LLM output produced a final answer "
f"and a parse-able action: {text}"
)
print(get_colored_text(f"Tool: {response['action']}", "blue"))
print(get_colored_text(f"Input: {response['action_input']}", "blue"))
print()
return AgentAction(
response["action"], response.get("action_input", {}), text
)

except Exception as e:
if not includes_answer:
引发OutputParserException(f"无法解析LLM输出:{text}: {str(e)}")
return AgentFinish(
{"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
)

此实现允许指定用于检测动作的自定义函数,或者换句话说,指定下一步使用哪个工具.

我们编写的用于从LLM输入检测下一个的函数可以在 agent_playground.py:

def action_detector_func(text):
"""
Method which tries to better understand the output of the LLM.
:param text: the text coming from the LLM response.
:返回一个json字符串,其中包含接下来要查询的工具的名称和该工具的输入.
"""
splits = text.split("```")
if len(splits) > 1:
# Original implementation + json snippet removal
return re.sub(r"^json", "", splits[1])
else:
lower_text = text.lower()
tool_tokens = ["wiki", "arxiv", "duckduckgo"]
token_tool_mapping = {
"wiki": "Wikipedia",
"arxiv": "arxiv",
"duckduckgo": "duckduckgo_search"
}
for token in tool_tokens:
if token in lower_text:
return json.dumps({
'action': token_tool_mapping[token],
'action_input': text
})
抛出OutputParserException('无法找到wiki或arxiv或duckduckgo操作或最终答案').')

In this function, 我们不仅要查找预期的JSON输出,还要查找可能指示使用下一个工具的单词.

使用法学硕士的一个问题是,它们在某种程度上是不可预测的,并且以意想不到的方式表达自己, 因此,这个函数只是试图以一种更灵活的方式捕捉法学硕士想要传达的信息.

Observations

我们注意到,LangChain使用了一个特殊的提示符来查询LLM如何对输入做出反应. The prompt used in this library is this one:

Answer the following questions as best you can. You have access to the following tools:

Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
arxiv: A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, Statistics, Electrical Engineering, and Economics from scientific articles on arxiv.org. Input should be a search query.
duckduckgo_search: A wrapper around DuckDuckGo Search. Useful for when you need to answer questions about current events. Input should be a search query.

The way you use the tools is by specifying a json blob.
Specifically, 这个json应该有一个' action '键(包含要使用的工具的名称)和一个' action_input '键(包含工具的输入到这里).

“action”字段中应该包含的值只有:Wikipedia、arxiv、duckduckgo_search

$JSON_BLOB应该只包含一个动作,不返回多个动作的列表. Here is an example of a valid $JSON_BLOB:

```
{
"action": $TOOL_NAME,
"action_input": $INPUT\n}
```

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action:
```
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin! Reminder to always use the exact characters `Final Answer` when responding.'

提示信息通过系统消息发送给LLM,同时发送的还有来自工具响应的问题或观察结果.

我们想知道你如何指导法学硕士在这种情况下的行为.

Final Thoughts

In this story, 我们试图描述一个简单的聊天代理是如何工作的,并试图理解聊天代理的内部机制. 您可以使用额外的工具来增强大型语言模型的功能,这些工具可以将法学硕士的知识扩展到他们通常无法访问的领域. 法学硕士不是在旧的知识基础上进行培训,而是使用公开可用的信息. 如果你想与法学硕士一起获取前沿新闻,代理是一个不错的选择.

There is a lot more you can do with agents, like letting them engage in adversarial scenarios or against code bases, but we will be looking at those scenarios in our upcoming stories.

Subscribe to our newsletter

Sign up here