Reputation: 1106
I looked through lot of documentation but got confused on the retriever part.
So I am building a chatbot using user's custom data.
Now I am following below approach
def doc_preprocessing(content):
doc = Document(page_content=content)
text_splitter = CharacterTextSplitter(
chunk_size=1000,
chunk_overlap=0
)
docs_split = text_splitter.split_documents([doc])
return docs_split
def embedding_db(user_id, content):
docs_split = doc_preprocessing(content)
# Extract text from the split documents
texts = [doc.page_content for doc in docs_split]
vectors = embeddings.embed_documents(texts)
# Store vectors with user_id as metadata
for i, vector in enumerate(vectors):
upsert_response = index.upsert(
vectors=[
{
'id': f"{user_id}",
'values': vector,
'metadata': {"user_id": str(user_id)}
}
]
)
This way it should create embeddings for the given data into pinecone.
Now the second part is to chat with this data. For QA, I have below
def retrieval_answer(user_id, query):
text_field = "text"
vectorstore = Pinecone(
index, embeddings.embed_query, text_field
)
vectorstore.similarity_search(
query,
k=10,
filter={
"user_id": str(user_id)
},
)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type='stuff',
retriever=vectorstore.as_retriever(),
)
result = qa.run(query)
print("Result:", result)
return result
but I keep getting
Found document with no `text` key. Skipping.
When i am doing QA, its not referring to the data stored in pinecone. Its just using the normal chatgpt. I am not sure what i am missing here.
Upvotes: 1
Views: 7149
Reputation: 361
you need to create a prompt using template and mention about your retrived documents as context, in your chain. or alternatively build your chain like this:
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
more details can be found in langchain documentation visit the documentation here:
https://python.langchain.com/docs/use_cases/question_answering/quickstart
Upvotes: 0