Manoj ahirwar
Manoj ahirwar

Reputation: 1106

How to use retriever in Langchain?

I looked through lot of documentation but got confused on the retriever part.

So I am building a chatbot using user's custom data.

  1. User will feed the data
  2. Data should be upserted to Pinecone
  3. Then later user can chat with their data
  4. there can be multiple users and each user will be able to chat with their own data.

Now I am following below approach

  1. Storing user data into Pinecone
def doc_preprocessing(content):
    doc = Document(page_content=content)
    text_splitter = CharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=0
    )
    docs_split = text_splitter.split_documents([doc])
    return docs_split

def embedding_db(user_id, content):
    docs_split = doc_preprocessing(content)
    # Extract text from the split documents
    texts = [doc.page_content for doc in docs_split]
    vectors = embeddings.embed_documents(texts)

    # Store vectors with user_id as metadata
    for i, vector in enumerate(vectors):
        upsert_response = index.upsert(
            vectors=[
                {
                    'id': f"{user_id}",
                    'values': vector,
                    'metadata': {"user_id": str(user_id)}
                }
            ]
        )

This way it should create embeddings for the given data into pinecone.

Now the second part is to chat with this data. For QA, I have below

def retrieval_answer(user_id, query):
    text_field = "text"
    vectorstore = Pinecone(
        index, embeddings.embed_query, text_field
    )

    vectorstore.similarity_search(
        query,
        k=10,
        filter={
            "user_id": str(user_id)
        },
    )

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type='stuff',
        retriever=vectorstore.as_retriever(),
    )
    result = qa.run(query)
    print("Result:", result)
    return result

but I keep getting

Found document with no `text` key. Skipping.

When i am doing QA, its not referring to the data stored in pinecone. Its just using the normal chatgpt. I am not sure what i am missing here.

Upvotes: 1

Views: 7149

Answers (1)

Farooq Zaman
Farooq Zaman

Reputation: 361

you need to create a prompt using template and mention about your retrived documents as context, in your chain. or alternatively build your chain like this:

rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()

)

more details can be found in langchain documentation visit the documentation here:

https://python.langchain.com/docs/use_cases/question_answering/quickstart

Upvotes: 0

Related Questions