langchain vectorstore question and answer from a single embedding in vectorstore

I have worked in creating a vectorstore from a series of paragraphs from a text document. The text of the document has been splitted in non-overlapping paragraphs for a good reason, as these represents different informations. These paragraphs have metadata that has been included

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import time


paragraphs_document_list = []

for paragraph in paragraph_list:
    documents_list.append(Document(page_content=paragraph,
metadata=dict(paragraph_id=paragraph_id,
                       page=pageno))


db = FAISS.from_documents(documents = paragraphs_document_list,
                          embedding = OpenAIEmbeddings(model="gpt-4")
                         )

Normally, I could query the general content of my document asking a question about its whole.

  qa_chain = RetrievalQA.from_chain_type(
                llm=ChatOpenAI(temperature = 0.0, model='gpt-4'), 
                chain_type="stuff", 
                retriever=db_test.as_retriever(), 
                verbose=False
                )

    label_output = qa_chain.run(query="What is this document about?")

However, I would like to, instead, retrieve the different embeddings in my FAISS vectorstore and then query those individually, using as query something like "What's this paragraph about?".

Is there any option to query a specific embedding or to use as retriever a single specific embedding? In any case, I would like to gain access of the original paragraph I'm querying together with its metadata.

I tried filtering using metadata to answer based on a specific paragraph:

filter_dict = {"paragraph_id":19, "page":5}

results = db.similarity_search(query, filter=filter_dict, k=1, fetch_k=1)

Upvotes: 2

Answers (3)

Francisco Estrada

Reputation: 419

if your use pinecone doc= https://docs.pinecone.io/docs/metadata-filtering

in pinecone your vector

vectorstore = Pinecone(index, embed.embed_query, "text")

Upvotes: 0

Fernando Delgado Chaves

Reputation: 71

Here is an answer:

Look into the index IDs assigned within the vector database and store them into an iterable object.
Iterate through the list of index IDs and use the ID as part of the filter_dict.

results = db.similarity_search(query, filter=filter_dict, k=1, fetch_k=1)

Upvotes: 0

Vỹ CT

Reputation: 11

Currently, the Langchain document has a guide for Chroma vectorstore that uses RetrievalQAWithSourcesChain function to search from metadatas. Another way is easily passing filter=filter_dict into search_kwargs parameter of as_retriever() function.

Upvotes: 1

langchain vectorstore question and answer from a single embedding in vectorstore

Answers (3)

Related Questions