大模型日志week02

day01

了解当下大语言模型

大模型架构

1. MoE(混合专家模型)：

MOE是把大问题先做拆分，再逐个解决小问题，再汇总结论。模型规模是提升模型性能的关键因素之一，在有限的计算资源下，用更少的训练步数训练一个更大的模型，往往比用更多的步数训练一个较小的模型效果更佳。
MoE正是基于上述的理念，它由多个专业化的子模型（即“专家”）组合而成，专家的混合不会节省任何计算，因为前向传播仍然需要评估每个专家，而反向传播也必须接触每个专家，但是我们可以选择由哪些专家进行回答并规范化，这代表在前向和反向传播时，只需要使用非0的专家。

2. 基于检索的模型:

检索增强生成的工作流程

检索：
首先，我们需要进行的是检索过程。在这个阶段，我们利用用户的查询内容，从外部知识源获取相关信息。具体来说，就是将用户的查询通过嵌入模型转化为向量，这样就可以与向量数据库中的其他上下文信息进行比对。通过这种相似性搜索，我们可以找到向量数据库中最匹配的前k个数据。
增强：
接下来，我们进入增强阶段。在这个阶段，我们将用户的查询和检索到的额外信息一起嵌入到一个预设的提示模板中。这个过程的目的是为了提供更丰富、更具上下文的信息，以便于后续的生成过程。
生成：
最后，我们进行生成过程。在这个阶段，我们将经过检索增强的提示内容输入到大语言模型（LLM）中，以生成所需的输出。这个过程是RAG的核心，它利用了LLM的强大生成能力，结合了前两个阶段的信息，生成了准确、丰富且与上下文相关的输出。

Naive RAG

Indexing(离线处理)：

从各种格式（PDF、HTML、Markdown、Word 等）的额外语料库中提取纯文本内容。
由于LLM上下文窗口的限制，比如常见的2K、4K，需要将提取的文本内容切分为不同的 chunks。
使用文本 embedding 模型，针对每个chunk提取相应的文本embedding。
将文本 embedding 和对应的 chunk 存储为索引，能一一对应，即生成chunk-embedding。

在线处理：

Retrieval（在线处理）：
使用文本 embedding 模型，针对用户 query 提取 query embedding。
使用query embedding 与索引中的 chunk embedding 进行比对，找到最相似的 k 个 embedding。然后提取 k 个最相似 embedding 对应的 chunk。
Generation（在线处理）：
将 query 与检索到的 chunks 进行合并，将合并后的Query输入到LLM中以生成结果。

RAG 是一种通过检索外部知识库(其实主要优化就是对这个知识库进行优化，提高知识粒度与检索过程优化等)来获得额外语料，并使用 ICL（In-Context-Learning，上下文学习）来改进 LLM 生成效果的范式。

alt text

2.1 Advanced RAG(在Navive RAG的基础上做了增强)

2.1.1 Pre-Retrieval Proess(检索前优化)

增强数据粒度：旨在提升文本标准化、一致性、事实准确性和丰富的上下文，以提高 RAG 系统的性能。比如删除不相关的信息、消除实体和数据中的歧义、更新过时文档等。
优化索引结构：调整块的大小以捕获相关上下文、跨多个索引路径进行查询。
混合检索：主要是指充分利用关键词检索、语义检索、向量检索等其他检索技术来提升检索丰富度，同时也可以保证一致性。

2.1.2 Post-Retrieval Process

检索到的内容比较多，重要性也各不相同，如果一味地和 query 合并可能会超过 LLM 上下文限制，同时增加计算开销，也可能引入噪声，导致生成质量不佳。此时，通常需要对检索到的内容进一步处理：

重排序（Re-Ranking）：这是搜索领域非常常见的手段，不过在传统搜索中通常是按相关性、质量等进行排序输出；而在 LLM 生成领域要考虑检索到文档的多样性，以及 LLM 对 Prompt 中内容所在位置的敏感度等，比如 LostInTheMiddleRanker 将最佳文档交替地放在上下文窗口的开头和结尾。
提示压缩（Prompt Compression）：有研究表明，检索到的文档中的噪声会对 RAG 性能产生不利影响。在后处理中，可以重点压缩不相干的上下文，突出关键段落，减少整体上下文长度。也可以用专门的模型对文档进行压缩、总结、过滤等。

Graph-RAG (知识图谱和RAG的结合，效果很好)

当外部知识库为知识图谱（Neo4j存储），结合向量 + 关键字和图检索方法（被称为混合检索）。

总结：

为了扩大模型规模，需要改进稠密Transformer。
混合专家和基于检索的方法相结合更有效。
如何设计更好的、可扩展的体系结构仍然是一个悬而未决的问题。

大模型之Adaptation

为什么需要语言模型的Adaptation?

下游任务与语言模型的训练任务之间的不同之处非常复杂。这些差异可以从格式、主题和时间三个方面来探讨，每个方面都可能涉及许多具体的挑战和需求。通过深入了解这些不同之处，我们可以更好地理解如何有效地适配语言模型以满足各种下游任务的需求。

格式：

自然语言推理（NLI）: 下游任务如NLI涉及两个句子的比较以产生单一的二进制输出。这与语言模型通常用于生成下一个标记或填充MASK标记的任务截然不同。例如，NLI的逻辑推理过程涉及多层次的比较和理解，而不仅仅是根据给定的上下文生成下一个可能的词。

主题转变：

特定领域的需求: 下游任务可能集中在特定的主题或领域上，例如医疗记录分析或法律文档解析。这些任务可能涉及专门的术语和知识，与模型的通用训练任务相去甚远。

时间转变

新知识的需求: 随着时间的推移，新的信息和知识不断涌现。例如，GPT-3在拜登成为总统之前就已训练完毕，因此可能缺乏有关他总统任期的最新信息。
非公开信息的需求: 有时下游任务可能涉及在训练期间不公开的信息。这可能需要更多特定领域的专业知识和调整。

微调模型适应任务

通用的Adaptation配置

预训练语言模型（Pre-trained LM）:
在适配阶段的开始，我们已经有了一个预训练的语言模型，用参数$θLM$表示。这个模型被训练来理解和生成语言，但不是特别针对任何特定任务。
下游任务数据集（Downstream Task Dataset）:
我们获得了一组来自下游任务分布$P_{task}$的样本数据。这些数据可以是文本分类、情感分析等任务的特定实例，每个样本由输入x和目标输出y组成，如：$\left(x^{(1)}, y^{(1)}\right), \ldots,\left(x^{(n)}, y^{(n)}\right)$。
适配参数（Adaptation Parameters）:
为了使预训练的LM适合特定的下游任务，我们需要找到一组参数$\gamma$，这组参数可以来自现有参数的子集或引入的新的参数，$\Gamma$。这些参数将用于调整模型，以便它在特定任务上的表现更好。
任务损失函数（Task Loss Function）:
我们需要定义一个损失函数$\ell_{\text {task }}$来衡量模型在下游任务上的表现。例如，交叉熵损失是一种常见的选择，用于衡量模型预测的概率分布与真实分布之间的差异。
优化问题（Optimization Problem）:
我们的目标是找到一组适配参数$\gamma_{\text {adapt }}$，使得任务损失在整个下游数据集上最小化。数学上，这可以通过以下优化问题表示

通过这个过程，我们可以取得一组适配参数$\gamma_{\text {adapt }}$，用于参数化适配后的模型$p_{adapt}$。这样，我们就可以将通用的、任务无关的预训练语言模型适配到特定的下游任务上，以实现更好的性能。这种适配方法将模型的通用性与特定任务的效能结合在一起，既保留了模型的灵活性，又确保了在特定任务上的高效表现。

主流的Adaptation方法

1. fine-tuning

Fine-tuning（微调）使用语言模型参数$θLM$作为优化的初始化。其中，优化后的参数家族$\Gamma$包括了所有的语言模型参数和任务特定的预测头参数。与此同时，预训练的优化器状态被丢弃。

大模型微调步骤

2. 大模型开发工具

Hugging Face

提供多种 NLP 任务的模型库，如语言翻译、文本生成和问答。
提供了在特定数据集上微调预训练模型的工具。
提供了访问和利用应用程序中预训练模型的 API。
提供了构建定制模型并将其部署到云端的工具。
提供大量预训练的NLP模型。

LangChain
LangChain 是一个用于开发由语言模型驱动的应用程序的框架。我们相信，最强大和不同的应用程序不仅将通过 API 调用语言模型，还将：

数据感知：将语言模型与其他数据源连接在一起。
主动性：允许语言模型与其环境进行交互

day02

LangChain详解

LangChain组成

LangChain 包含六部分组成，分别为：Models、Prompts、Indexes、Memory、Chains、Agents。

1. Models

LangChain 为使用聊天模型提供了一个标准接口。LangChain 目前支持的消息类型有AIMessage、HumanMessage、SystemMessage 和 ChatMessage，其中ChatMessage接受一个任意的角色参数。大多数情况下，您只需要处理 HumanMessage、AIMessage 和 SystemMessage。

# 导入OpenAI的聊天模型，及消息类型
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

# 初始化聊天对象
chat = ChatOpenAI(openai_api_key="...")

# 向聊天模型发问
chat([HumanMessage(content="Translate this sentence from English to French: I love programming.")])

# 支持多个消息作为输入
messages = [
    SystemMessage(content="You are a helpful assistant that translates English to French."),
    HumanMessage(content="I love programming.")
]
chat(messages)

如果用户问聊天模型同一个问题，对结果进行了缓存，这样就可以减少接口的调用并且也能加快接口返回的速度。提供了两种缓存方案，内存缓存方案和数据库缓存方案.

# 导入聊天模型，SQLiteCache模块
import os
os.environ["OPENAI_API_KEY"] = 'your apikey'
import langchain
from langchain.chat_models import ChatOpenAI
from langchain.cache import SQLiteCache
# 设置语言模型的缓存数据存储的地址
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
# 加载 llm 模型
llm = ChatOpenAI()

# 第一次向模型提问
result = llm.predict('tell me a joke')
print(result)

# 第二次向模型提问同样的问题
result2 = llm.predict('tell me a joke')
print(result2)

2. Embeddings

文本等内容嵌入成多维数组，可以后续进行相似性的计算和检索。

import os
from langchain.embeddings.openai import OpenAIEmbeddings
os.environ["OPENAI_API_KEY"] = 'your apikey'

# 初始化嵌入模型
embeddings = OpenAIEmbeddings()

# 把文本通过嵌入模型向量化
res = embeddings.embed_query('hello world')
print(res)

3. LLMs

LangChain继承了许多大语言模型。

4. Propmts

LangChain 提供了 PromptTemplates，允许你可以根据用户输入动态地更改提示，当用户需要输入多个类似的 prompt 时，生成一个 prompt 模板是一个很好的解决方案，可以节省用户的时间和精力。

from langchain.llms import OpenAI

# 定义生成商店的方法
def generate_store_names(store_features):
    prompt_template = "我正在开一家新的商店，它的主要特点是{}。请帮我想出10个商店的名字。"
    prompt = prompt_template.format(store_features)

    llm = OpenAI()
    response = llm.generate(prompt, max_tokens=10, temperature=0.8)

    store_names = [gen[0].text.strip() for gen in response.generations]
    return store_names

store_features = "时尚、创意、独特"

store_names = generate_store_names(store_features)
print(store_names)

LangChainHub包含了许多可以通过LangChain直接加载的Prompt Templates。也可以通过学习他们的 Prompt 设计来给我们以启发.

5. Few-Shot example

使用FewShotPromptTemplate可以更方便地使用少样本学习策略, 其中包含一组示例问题和对应的答案(few-shot examples)与PromptTemplate，它使用few-shot examples格式化PromptTemplate。当examples较多时，可以使用example_selector选择部分样例。

import os
os.environ["OPENAI_API_KEY"] = 'your apikey'
from langchain import PromptTemplate, FewShotPromptTemplate
from langchain.llms import OpenAI

examples = [
    {"word": "黑", "antonym": "白"},
    {"word": "伤心", "antonym": "开心"},
]

example_template = """
单词: {word}
反义词: {antonym}\n
"""

# 创建提示词模版
example_prompt = PromptTemplate(
    input_variables=["word", "antonym"],
    template=example_template,
)

#使用example_selector选择部分样例
example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    # 最大长度
    max_length=25,
)

# 创建小样本提示词模版
few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="给出每个单词的反义词",
    suffix="单词: {input}\n反义词:",
    input_variables=["input"],
    example_separator="\n",
)


# 格式化小样本提示词
prompt_text = few_shot_prompt.format(input="粗")

# 调用OpenAI
llm = OpenAI(temperature=0.9)

print(llm(prompt_text))

indexes

索引是指对文档进行结构化的方法，以便 LLM 能够更好的与之交互。该组件主要包括：Document Loaders（文档加载器）、Text Splitters（文本拆分器）、VectorStores（向量存储器）以及 Retrievers（检索器）。

DocumentLoaer

指定源进行加载数据的。将特定格式的数据，转换为文本。如 CSV、File Directory、HTML。

TexSplitter

由于模型对输入的字符长度有限制，我们在碰到很长的文本时，需要把文本分割成多个小的文本片段。LangChain中最基本的分割器CharacterTextSplitter，它按照指定的分隔符（默认“\n\n”）进行分割，并且考虑文本片段的最大长度。

LangChain支持多个高级文本分割器如下：
alt text

VectorStores、Retrievers

VectorStores存储提取的文本向量，包括 Faiss、Milvus、Pinecone、Chroma 等。
Retrievers检索器是一种便于模型查询的存储数据的方式，LangChain 约定检索器组件至少有一个方法 get_relevant_texts，这个方法接收查询字符串，返回一组文档。

Chains

LangChain将上述多个组件组合在一起以创建一个单一的、连贯的任务。例如，可以创建一个链，它接受用户输入，使用 PromptTemplate 对其进行格式化，然后将格式化的响应传递给 LLM。另外我们也可以通过将多个链组合在一起，或者将链与其他组件组合来构建更复杂的链。

构建chain的步骤：

先写一个提问的模版prompt，其中中括号{}表示你要填入的变量名称。比如：”Tell me a joke about {adjective} tomatoes.”

组建LLMChain示例，指定llm、prompt两个必要参数，比如chain = LLMChain(llm=llm, prompt=promt,verbose=True)。

启动Chain，通过chain.run传递输入的变量，res = chain.run(product)。

LLMChain

LLMChain（是Chains中的一种）接受一个提示模版，将模版与用户输入进行格式化，并返回LLM的响应。

from langchain import PromptTemplate, OpenAI, LLMChain

prompt_template = "What is a good name for a company that makes {product}?"

llm = OpenAI(temperature=0)
llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template(prompt_template)
)
llm_chain("colorful socks")

run方法和call方法的入参可以是字符串，或者字典数据等

# call方法返回输入和输出键值，return_only_outputs表示只返回输出键值。
llm_chain.call("corny", return_only_outputs=True) #返回结果只输出键值
# 输出结果 
{'text': 'Why did the tomato turn red? Because it saw the salad dressing!'}

# run方法返回的是字符串
llm_chain.run({"adjective": "corny"}) #输入一个字典
# 输出结果 
'Why did the tomato turn red? Because it saw the salad dressing!'

# apply方法允许输入列表
input_list = [
    {"product": "socks"},
    {"product": "computer"},
    {"product": "shoes"}
]
llm_chain.apply(input_list)
# 输出结果 
    [{'text': '\n\nSocktastic!'},
     {'text': '\n\nTechCore Solutions.'},
     {'text': '\n\nFootwear Factory.'}]

# generate方法类似于apply方法，返回的是LLMResult类型的结果。
llm_chain.generate(input_list)

# 输出结果：
    LLMResult(generations=[[Generation(text='\n\nSocktastic!', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nTechCore Solutions.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nFootwear Factory.', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'prompt_tokens': 36, 'total_tokens': 55, 'completion_tokens': 19}, 'model_name': 'text-davinci-003'})

llm_chain.predict(product="colorful socks") #入参为指定关键字参数

SimpleSequentialChain

简单的顺序链，将输入输出链接起来。

alt text
将两个LLMChain进行组合成顺序链调用的案例：

from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain

# 定义第一个chain
llm = OpenAI(temperature=.7)
template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.

Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template)

# 定义第二个chain

llm = OpenAI(temperature=.7)
template = """You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.

Play Synopsis:
{synopsis}
Review from a New York Times play critic of the above play:"""
prompt_template = PromptTemplate(input_variables=["synopsis"], template=template)
review_chain = LLMChain(llm=llm, prompt=prompt_template)

# 通过简单顺序链组合两个LLMChain
overall_chain = SimpleSequentialChain(chains=[synopsis_chain, review_chain], verbose=True)

# 执行顺序链
review = overall_chain.run("Tragedy at sunset on the beach")

TransformChain

转换链允许我们创建一个自定义的转换函数来处理输入，将处理后的结果用作下一个链的输入。

from langchain.chains import TransformChain, LLMChain, SimpleSequentialChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

# 模拟超长文本
with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()

# 定义转换方法，入参和出参都是字典，取前三段
def transform_func(inputs: dict) -> dict:
    text = inputs["text"]
    shortened_text = "\n\n".join(text.split("\n\n")[:3])
    return {"output_text": shortened_text}

# 转换链：输入变量：text，输出变量：output_text
transform_chain = TransformChain(
    input_variables=["text"], output_variables=["output_text"], transform=transform_func
)
# prompt模板描述
template = """Summarize this text:

{output_text}

Summary:"""
# prompt模板
prompt = PromptTemplate(input_variables=["output_text"], template=template)
# llm链
llm_chain = LLMChain(llm=OpenAI(), prompt=prompt)
# 使用顺序链
sequential_chain = SimpleSequentialChain(chains=[transform_chain, llm_chain])
# 开始执行
sequential_chain.run(state_of_the_union)
# 结果
"""
    ' The speaker addresses the nation, noting that while last year they were kept apart due to COVID-19, this year they are together again.
    They are reminded that regardless of their political affiliations, they are all Americans.'

"""

SequentialChain

SequentialChain允许每个链具有多个输入输出，重要的是当具有多个输入时，要命名输入/输出的变量名称。

from langchain.chains import SequentialChain
'''
######################################
### Chain1 给中文产品名称翻译成英文  ###
######################################
'''
# Chain1 语言转换，产生英文产品名
prompt1 = ChatPromptTemplate.from_template(
    "将以下文本翻译成英文: {product_name}"
)
chain1 = LLMChain(
    # 使用的大模型实例
    llm=llm,
    # prompt模板
    prompt=prompt1,
    # 输出数据变量名
    output_key="english_product_name",
)

'''
##################################################
### Chain2 根据英文产品名，生成一段英文介绍文本   ###
##################################################
'''
# Chain2 根据英文产品名，生成一段英文介绍文本
prompt2 = ChatPromptTemplate.from_template(
    "Based on the following product, give an introduction text about 100 words: {english_product_name}"
)
chain2 = LLMChain(
    llm=llm,
    prompt=prompt2,
    output_key="english_introduce"
)

'''
###########################################
### Chain3 产品名的语言判定(中文or英文)   ###
###########################################
'''
# Chain3 找到产品名所属的语言
prompt3 = ChatPromptTemplate.from_template(
    "下列文本使用的语言是什么?: {product_name}"
)
chain3 = LLMChain(
    llm=llm,
    prompt=prompt3,
    output_key="language"
)

'''
#########################
### Chain4 生成概述   ###
#########################
'''
# Chain4 根据Chain2生成的英文介绍，使用产品名称原本的语言生成一段概述
prompt4 = ChatPromptTemplate.from_template(
    "使用语言类型为: {language} ，为下列文本写一段不多于50字的概述: {english_introduce}"
)
chain4 = LLMChain(
    llm=llm,
    prompt=prompt4,
    output_key="summary"
)

'''
############################
### 组建SequentialChain  ###
############################
'''
# 标准版的序列Chain,SequentialChain,其中每个chain都支持多个输入和输出，
# 根据chains中每个独立chain对象，和chains中的顺序，决定参数的传递，获取最终的输出结果
overall_chain = SequentialChain(
    chains=[chain1, chain2, chain3, chain4],
    input_variables=["product_name"],
    output_variables=["english_product_name", "english_introduce", "language", "summary"],
    verbose=True
)
product_name = "重庆小面"
res = overall_chain(product_name)

创建步骤简单而言：

n件事，先写n个LLMChain（提问的模版prompt + chain）
组建SequentialChain示例，指定chains、input_variables、output_variables三个必要参数

chains：n个chain，list格式
input_variables：输入变量，SequentialChain的最初输入
output_variables：所有的输出变量，SequentialChain的中间输出

启动Chain，通过overall_chain传递输入的变量product_name
输出结果，结果由所有参数构成，字典dict格式。（这种输出方式，有助于代码工程化时的正确取值，dddd）

LLMRouterChain

LLMRouterChain 是根据提示词的不同而选择不同的Chain进行执行，实现分支判断的作用。

alt text

构建流程：
1.【Step1】初始化语言模型（”qwen:7b”)
2.【Step2】构建提示信息（json格式），包括：key、description 和 template

【Step2.1】构建两个场景的模板
【Step2.2】构建提示信息

3.【Step3】构建目标链chain_map（json格式），以提示信息prompt_infos中的key为key，以Chain为value
4.【Step4】构建路由链router_chain
5.【Step5】构建默认链 default_chain
6.【Step6】构建多提示链 MultiPromptChain

from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser
from langchain.chains.router.multi_prompt_prompt import MULTI_PROMPT_ROUTER_TEMPLATE as RounterTemplate

## 【Step1】初始化语言模型
# from langchain.llms import OpenAI
# llm = OpenAI()
# llm = AzureChatOpenAI(deployment_name="GPT-4", temperature=0)

ollama_llm = Ollama(model="qwen:7b")

## 【Step2】构建提示信息（json格式），包括：key、description 和 template
# 【Step2.1】构建两个场景的模板
flower_care_template = """
你是一个经验丰富的园丁，擅长解答关于养花育花的问题。
下面是需要你来回答的问题:
{input}
"""

flower_deco_template = """
你是一位网红插花大师，擅长解答关于鲜花装饰的问题。
下面是需要你来回答的问题:
{input}
"""

# 【Step2.2】构建提示信息
prompt_infos = [
    {
        "key": "flower_care",
        "description": "适合回答关于鲜花护理的问题",
        "template": flower_care_template,
    },
    {
        "key": "flower_decoration",
        "description": "适合回答关于鲜花装饰的问题",
        "template": flower_deco_template,
    }
]


## 【Step3】构建目标链chain_map（json格式），以提示信息prompt_infos中的key为key，以Chain为value
chain_map = {}

for info in prompt_infos:
    prompt = PromptTemplate(
        template=info['template'],
        input_variables=["input"] #指定输入变量input
    )
    print("目标提示:\n", prompt)
    
    chain = LLMChain(
        llm=ollama_llm,
        prompt=prompt,
        verbose=True
    )
    chain_map[info["key"]] = chain

## 【Step4】构建路由链router_chain
destinations = [f"{p['key']}: {p['description']}" for p in prompt_infos]
router_template = RounterTemplate.format(destinations="\n".join(destinations))
print("路由模板:\n", router_template)

router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(), #解析模型输出确定最佳目的地
)
print("路由提示:\n", router_prompt)

router_chain = LLMRouterChain.from_llm(
    ollama_llm,
    router_prompt,
    verbose=True
)

## 【Step5】构建默认链 default_chain 
from langchain.chains import ConversationChain
default_chain = ConversationChain(
    llm=ollama_llm,
    output_key="text",
    verbose=True
)

## 【Step6】构建多提示链 MultiPromptChain
from langchain.chains.router import MultiPromptChain

chain = MultiPromptChain(
    router_chain=router_chain,
    destination_chains=chain_map,
    default_chain=default_chain,
    verbose=True
)

# 测试1
print(chain.run("如何为玫瑰浇水？"))

Memeory

OpenAI提供的聊天接口 api，本身是不具备“记忆的”能力。如果想要使聊天具有记忆功能，则需要我们自行维护聊天记录，即每次把聊天记录发给语言模型。具体过程如下：

import openai

#第一次发送
openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"},
    ]
)
#第二次发送

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"},
        {"role": "assistant", "content": "Hello, how can I help you?"},
        {"role": "user", "content": "who is more stylish Pikachu or Neo"},
    ]
)

只需要保存最近几次的聊天记录，LangChain提供Memory组件保存聊天记忆。

使用Memory组件

1. ChatMessageHistory

from langchain.memory import ChatMessageHistory
from langchain_community.llms import Tongyi
llm = Tongyi()

history = ChatMessageHistory()
history.add_user_message("你好")
history.add_ai_message("你好?")
history.add_user_message("请问丹麦的首都是哪里?")
history.add_ai_message("哥本哈根")
print(history.messages)

ret = llm.invoke(history.messages)
print(ret)

2. ConversationBufferMemory

ConversationBufferMemory是Langchain框架中用于存储对话历史的一个内存组件。它类似于一个缓冲区，将对话中的所有消息（包括用户输入和AI响应）按照顺序存储起来，这样向LLM发送的消息就会带上最近几次的聊天记录了。

#通过 ConversationBufferMemory（缓冲记忆）可以实现最简单的记忆机制。
from langchain.chains.conversation.base import ConversationChain
from langchain_community.llms import Tongyi

llm = Tongyi()

from langchain.chains.conversation.memory import ConversationBufferMemory

conversation = ConversationChain(llm=llm,memory=ConversationBufferMemory())

#第一天的对话
#回合1
conversation.invoke("我姐姐明天要过生日，我需要一束生日花束。")
print("第一次对话后的记忆:", conversation.memory.buffer,"\n")

# 回合2
conversation.invoke("她喜欢粉色玫瑰，颜色是粉色的。")
print("第二次对话后的记忆:", conversation.memory.buffer,"\n")

# 回合3 （第二天的对话）
conversation.invoke("我又来了，还记得我昨天为什么要来买花吗？")
print("\n第三次对话后时提示:\n",conversation.prompt.template)
print("\n第三次对话后的记忆:\n", conversation.memory.buffer,"\n")

3. ConversationBufferWindowMemory

ConversationBufferWindowMemory基于滑动窗口的概念，只保留对话历史中的最近几轮交互。这种机制类似于人类的短期记忆，能够高效地管理对话上下文，同时减少内存占用。

# 实现一个最近的对话窗口，超过窗口条数的对话将被删除
from langchain.memory import  ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=5) #保存最近五次的对话窗口

memory.save_context({"input":"你好，我是人类!"},{"output":"你好，我是AI，有什么可以帮助你的吗？"})
memory.save_context({"input":"我想吃鸭肉"},{"output":"好的，我帮你找找鸭肉的做法"})

print(memory.buffer)

4. ConversationEntityMemory

Langchain的Memory组件中的ConversationEntityMemory是一种特殊的记忆组件，它专注于跟踪和存储对话中提及的实体信息。它通过识别对话中的实体（如人名、地名、产品名等），并将这些实体及其相关信息存储在内存中。

5. ConversationKGMemory

ConversationKGMemory是Langchain中的一个内存组件，它将对话历史中的关键信息（如实体、概念等）映射到知识图谱中，建立对话上下文与知识库之间的联系。

# 使用知识图谱构建记忆
# from langchain.memory import ConversationKGMemory
from langchain_community.memory.kg import ConversationKGMemory
from langchain_community.llms import Tongyi
llm = Tongyi()

memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to dahuang"}, {"output": "who is dahuang"})
memory.save_context({"input": "dahuang is a dog name"}, {"output": "okay"})

print(memory.load_memory_variables({"input": "who is xiaohei"}))
print(memory.load_memory_variables({"input": "who is dahuang"}))

print(memory.get_knowledge_triplets("her favorite color is red"))

print(memory.get_current_entities("what's Sams favorite color?"))
print(memory.get_current_entities("穿着蓝色衣服、手里拿着冰糖葫芦的小明与小花正在去爬山"))
print(memory.get_current_entities("大壮的职业一个消防员"))

6. ConversationSummaryMemory

ConversationSummaryMemory通过对整个对话的内容进行总结，生成一个简洁的对话摘要，并记住这个摘要。

在链上使用Memory

1. LLMChain上使用Memory

from langchain.chains.llm import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

from langchain_community.llms import Tongyi
llm = Tongyi()

template = """你是一个机器人助理。
{chat_history}
user:{human_input}
AI:"""

prompt= PromptTemplate(
    template=template,
    input_variables=["chat_history", "human_input"],
)

memory = ConversationBufferMemory(
    memory_key="chat_history",
)

chain = LLMChain(
    llm=llm,
    memory=memory,
    prompt=prompt,
)

print(chain.invoke("中国的首都是哪里？"))
print(chain.invoke("推荐一个旅游景点"))
print(chain.invoke("怎么去？"))

2. ConversationChain上使用Memory

from langchain.chains.conversation.base import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_community.llms import Tongyi
llm = Tongyi()

memory = ConversationBufferMemory(
    memory_key="history",
    return_messages=True,
)
chain = ConversationChain(
    llm=llm,
    memory=memory
)
tp = {
    "input": "给我讲一个笑话"
}

chain.invoke("给我讲一个笑话")
chain.invoke("这个不好笑")

print(memory.buffer)

3. 同一个链合并使用多个memory

# from langchain.Langchain_Chains import ConversationChain
from langchain.chains.conversation.base import ConversationChain
# from langchain.llms import OpenAI
from langchain.memory import (
    ConversationBufferMemory,
    ConversationSummaryMemory,
    CombinedMemory
)
from langchain.prompts import PromptTemplate
from langchain_community.llms import Tongyi
llm = Tongyi()

#使用CoversationSummaryMemory对对话进行总结
summay = ConversationSummaryMemory(
    llm=llm,
    input_key="input"
)
#使用ConversationBufferMemory对对话进行缓存
cov_memory = ConversationBufferMemory(
    memory_key="history_now",
    input_key="input",
)

#合并使用多个memory
memory = CombinedMemory(
    memories=[summay, cov_memory],
)

TEMPLATE = """下面是一段AI与人类的对话，AI会针对人类问题，提供尽可能详细的回答，如果AI不知道答案，会直接回复'人类老爷，我真的不知道'.
之前的对话摘要:
{history}
当前对话:
{history_now}
Human:{input}
AI："""

prompt = PromptTemplate(
    template=TEMPLATE,
    input_variables=["history", "history_now", "input"],
)

chain = ConversationChain(
    llm=llm,
    memory=memory,
    prompt=prompt
)

print(chain.run("2024年NBA冠军是谁"))
print(chain.run("介绍一下python语言"))

4. 构建问答对话链

#构建问答对话链
from langchain.chains.question_answering import load_qa_chain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_community.llms import Tongyi

from langchain_core.documents import Document

llm = Tongyi()

docs = [Document(page_content='这是一些无用的干扰项文本\n，这是一些无用的干扰项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本\n项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本\n项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本\n项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本\n项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本项文本，这是一些无用的干扰项文本\n在2025年NBA总决赛的最后一场比赛中，哈哈队在主场以94-89击败热火，以总比分4-1成功夺得NBA总冠军。这是球队历史上第一次获得总冠军，也是球队自1976年进入NBA以来的最佳成绩。哈哈队的成功主要归功于他们的领袖球员HAHA的出色表现。\n在总决赛的五场比赛中，HAHA展现出了惊人的统治力。他场均贡献30.2分、14个篮板和7.2次助攻，成为球队得分、篮板和助攻的核心。HAHA在进攻端展现出了全面的技术，他的得分能力和篮板能力让热火队无可奈何。同时，他还展现出了出色的组织能力，为球队创造了很多得分机会。\n在总决赛的最后一场比赛中，HAHA更是发挥出色。他在关键时刻承担责任，不仅在进攻端贡献了关键得分，还在防守端起到了重要作用。他的领导能力和稳定性为球队赢得了决胜的胜利。\nHAHA荣获总决赛最有价值球员（MVP）毫无悬念。他在总决赛中的出色表现让他成为了不可或缺的球队核心，也让他获得了职业生涯中的首个总冠军。这一荣誉不仅是对他个人努力的认可，也是对他带领球队取得成功的肯定。\n随着HAHA的崛起，哈哈队在过去几个赛季中逐渐崭露头角。他的全面发展和领导能力使他成为了球队的核心和灵魂人物。通过这次总决赛的胜利，孙健不仅实现了自己的篮球梦想，也为球队带来了无比的荣耀。\nHAHA带领哈哈队赢得2025年NBA总冠军，并凭借出色的表现获得总决赛最有价值球员（MVP）的荣誉。他在总决赛期间的统治力和全面能力使他成为球队的核心，同时也展现了他的领导才能。这次胜利不仅是HAHA个人职业生涯的里程碑，也是哈哈队迈向更高荣耀的关键一步。随着HAHA的领导，哈哈队有望在未来继续取得更多的成功。'),]

template = """下面是一段AI与人类的对话，AI会针对人类问题，提供尽可能详细的回答，如果AI不知道答案，会直接回复'人类老爷，我真的不知道'，参考一下相关文档以及历史对话信息，AI会据此组织最终回答内容.
{context}
{chat_history}
Human:{human_input}
AI:"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "chat_history", "human_input"],
)

#使用ConversationBufferMemory对对话进行缓存
memory = ConversationBufferMemory(
    memory_key="chat_history",
    input_key="human_input",
    return_messages=True,
)

#加载对话链
chain = load_qa_chain(
    llm=llm,
    memory=memory,
    prompt=prompt,
    chain_type="stuff"
)

# chain.run("2024年NBA冠军是谁")
# chain({"input_documents":docs,"human_input":"公司的营销策略是什么？"})
print(chain({"input_documents": docs, "human_input": "2025年NBA冠军是谁？"}))