Reputation: 71
I have worked in creating a vectorstore from a series of paragraphs from a text document. The text of the document has been splitted in non-overlapping paragraphs for a good reason, as these represents different informations. These paragraphs have metadata that has been included
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import time
paragraphs_document_list = []
for paragraph in paragraph_list:
documents_list.append(Document(page_content=paragraph,
metadata=dict(paragraph_id=paragraph_id,
page=pageno))
db = FAISS.from_documents(documents = paragraphs_document_list,
embedding = OpenAIEmbeddings(model="gpt-4")
)
Normally, I could query the general content of my document asking a question about its whole.
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(temperature = 0.0, model='gpt-4'),
chain_type="stuff",
retriever=db_test.as_retriever(),
verbose=False
)
label_output = qa_chain.run(query="What is this document about?")
However, I would like to, instead, retrieve the different embeddings in my FAISS vectorstore and then query those individually, using as query something like "What's this paragraph about?".
Is there any option to query a specific embedding or to use as retriever a single specific embedding? In any case, I would like to gain access of the original paragraph I'm querying together with its metadata.
I tried filtering using metadata to answer based on a specific paragraph:
filter_dict = {"paragraph_id":19, "page":5}
results = db.similarity_search(query, filter=filter_dict, k=1, fetch_k=1)
Upvotes: 2
Views: 4631
Reputation: 419
if your use pinecone doc= https://docs.pinecone.io/docs/metadata-filtering
vectorstore = Pinecone(index, embed.embed_query, "text")
Upvotes: 0
Reputation: 71
Here is an answer:
results = db.similarity_search(query, filter=filter_dict, k=1, fetch_k=1)
Upvotes: 0
Reputation: 11
Currently, the Langchain document has a guide for Chroma vectorstore that uses RetrievalQAWithSourcesChain function to search from metadatas. Another way is easily passing filter=filter_dict into search_kwargs parameter of as_retriever() function.
Upvotes: 1