Langchain RetrievalQA: Missing some input keys

I have created a RetrievalQA Chain, but facing an issue. When calling the Chain, I get the following error: ValueError: Missing some input keys: {'query', 'typescript_string'}

My code looks as follows:

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-large",
                                       model_kwargs={'device': 'mps'}, encode_kwargs={'device': 'mps', 'batch_size': 32})

vectorstore = FAISS.load_local("vectorstore/db_faiss_bkb", embeddings)
retriever = vectorstore.as_retriever(search_kwargs={'k': 1, 'score_treshold': 0.75}, search_type="similarity")
llm = build_llm("modelle/mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf")

def build_retrieval_qa(llm, prompt, vectordb):
    chain_type_kwargs={ "prompt": prompt,
                        "verbose": False
                        }
    dbqa = RetrievalQA.from_chain_type(llm=llm,
                                        chain_type='stuff',
                                        retriever=vectordb,
                                        return_source_documents=True,
                                        chain_type_kwargs=chain_type_kwargs,
                                        verbose=True
                                        )
    return dbqa

prompt = """
<s> [INST] You are getting a task and a User input. If there is relevant information in the context, please add this information to your answer.

### Here the Task: ###
{typescript_string}

### Here the context: ###
{context}

### Here the User Input: ###
{query}

Answer: [/INST]
"""

prompt_temp = PromptTemplate(template=prompt, input_variables=["typescript_string", "context", "query"])

dbqa1 = build_retrieval_qa(llm=llm,prompt=prompt_temp,vectordb=retriever)

question = "What is IGB?"
types = "Answer shortly!"

dbqa1({"query": question, "typescript_string": types})

With this code the error from above occurs in the last line here.

The weird thing is, that it is working with a LLM-Chain from Langchain without Retrieval:

from langchain.chains import LLMChain

llm_chain = LLMChain(
        llm=llm,
        prompt= prompt_temp,
        verbose=True,
)

test = llm_chain({"type_string": types, "input": question})
test

This works and I am getting a correct response. I am using

Langchain == 0.1.0

So is there something wrong with my PromptTemplate?

Upvotes: 1

Answers (3)

Anusha Reddy Gudipati

Reputation: 11

When incorporating JSON-like examples within Python format strings, particularly in templates or prompts for language models, it's crucial to double the braces ({{ and }}) around JSON objects. This is necessary because single braces are interpreted by Python as placeholders for variable substitution. Doubling the braces escapes them, allowing the JSON format to be included directly in the string without being mistaken for a variable to be replaced. This solution ensures that any JSON-like syntax within your string is correctly interpreted as literal text rather than a variable or placeholder. for example:

system_message_template = "<s>" + B_INST + B_SYS + """
                                             Use the following context to answer the questions:
                                             --------------
                                             {context}
                                             --------------
                                             Please provide the answers in JSON format.
                                             Example: Who is the Primary Borrower
                                             "Primary Borrower: ": [
                                               {{
                                                 "value": "HAPPY MAILMAN MIDTOWN LLC ",
                                                 "key_confidence": 82.62330627441406,
                                                 "val_confidence": 98.17585754394531
                                               }}
                                              ]
 
                                             If you don't know the answer, just say that you don't know. Don't try to make up an answer. 
                                            """ + E_SYS+E_INST+"</s>"

Note:Only the context should be in single braces({}). JSON-like structures must be in double braces({{ }}).

Upvotes: 1

Maxl Gemeinderat

Reputation: 555

Found a way to solve the issue posted here in this discussion: https://github.com/langchain-ai/langchain/discussions/11542#discussioncomment-8184412

Here the answer if the webpage changes: So got it working but figured out a weird behaviour. My prompt & code looks as follows and it works:

rag_prompt= """
<s> [INST] Im folgenden bekommst du eine Aufgabe. Erledige diese anhand des User Inputs.
Falls im Kontext relevante Informationen enthalten sind, die deine Antwort verbessern, füge diese in deine Antwort ein. 
Wichitg ist, dass du keine zusätzlichen Informationen erfindest, sondern gegebenenfalls nur die im Kontext enthaltenen.
Die Antwort soll sich nicht wiederholen.

### Hier die Aufgabe: ###
{typescript_string}

### Hier Kontext zum User Input: ###
{{context}}

### Hier der User Input: ###
{{question}}

Antwort: [/INST]
"""

rag_prompt = rag_prompt.format(typescript_string="Schreibe die Stichpunkte in einen Volltext um.")
def set_rag_prompt():
    return PromptTemplate(template=rag_prompt, input_variables=['typescript_string', 'context', 'question'])

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-large",
                                       model_kwargs={'device': 'mps'}, encode_kwargs={'device': 'mps', 'batch_size': 32})

vectorstore = FAISS.load_local("vectorstore/db_faiss_bkb", embeddings)
retriever = vectorstore.as_retriever(search_kwargs={'k': 1, 'score_treshold': 0.75}, search_type="similarity")
llm = build_llm("modelle/mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf")
qa_prompt = set_rag_prompt()
def build_retrieval_qa(llm, prompt, vectordb):
    chain_type_kwargs={ "prompt": prompt,
                        "verbose": False
                        }
    dbqa = RetrievalQA.from_chain_type(llm=llm,
                                        chain_type='stuff',
                                        retriever=vectordb,
                                        return_source_documents=True,
                                        chain_type_kwargs=chain_type_kwargs,
                                        verbose=True
                                        )
    return dbqa


dbqa = build_retrieval_qa(llm=llm,prompt=qa_prompt,vectordb=retriever)
dbqa({"query": "Test Query"})

The weird behaviour is happening in naming the Prompt input variables. It is only working if the input_variable pair is "context" and "question". Before (when it did not work), "{question}" was named "{query}" in the Prompt and when calling the RetrievalQA chain with dbqa({"query": "Test Query"}) it gave me a "Key Error {query}". After renaming it to "question" the code runs through.

Upvotes: 5

j3ffyang

Reputation: 2470

The problem is that the values of {typescript_string} and {query} have not been transferred into template, even dbqa1({"query": question, "typescript_string": types}) is defined to provide values in retrieval only (rather than in prompt).

Suggest to use RunnablePassthrough function and giving an example with Mistral-7B model downloaded locally (actually in this code, everything is running locally).

from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")


from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_texts(
    ["Harry Potter's owl is in the castle"], embedding=embeddings)
retriever = vectorstore.as_retriever()


# from langchain.prompts import PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
template = """Answer the question based on on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)


from langchain_community.llms import GPT4All
model = GPT4All(
    model="/home/jeff/.cache/huggingface/hub/gpt4all/mistral-7b-openorca.Q4_0.gguf",
    device='gpu',
    n_threads=8)


from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

result = chain.invoke("Where is the hedwig?")
print(result)

PS. The credit for this issue goes to Jerry, with whom I discussed the matter yesterday

Upvotes: 0

Langchain RetrievalQA: Missing some input keys

Answers (3)

Related Questions