Reputation: 1
I have uploaded documents with metadata to VertexAI Search and Conversation and I am using langchain to build a RAG model based on the documents I uploaded. I would like to add filters on the documents used to generate the answer based on the metadata I uploaded.
The metadata is in this format:
{"id": 1,
"structData": {"id": "2743",
"tags": "Operation",
"language": "English"
},
"content": {"mimeType": "text/plain", "uri": #link to gcs location of the file}}
For example, I would like to get an answer only based on documents that have "English" as "language".
I use this code to get the answer to my query using langchain:
import vertexai
from langchain.llms import VertexAI
from langchain.chains import RetrievalQA
from langchain.retrievers import GoogleVertexAISearchRetriever
vertexai.init(project=PROJECT_ID, location=REGION)
llm = VertexAI(model_name=MODEL)
retriever = GoogleVertexAISearchRetriever(
project_id=PROJECT_ID,
location_id=DATA_STORE_LOCATION,
data_store_id=DATA_STORE_ID,
get_extractive_answers=True,
max_documents=10,
max_extractive_segment_count=1,
max_extractive_answer_count=5
)
search_query = "What are some spices that can help lower blood sugar?"
retrieval_qa = RetrievalQA.from_chain_type(
llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True
)
results = retrieval_qa({"query": search_query})
I already tried to pass a "filter" argument to the retriever I am using based on the langchain documentation but I always run into errors, no matter which syntax I am using.
Upvotes: 0
Views: 566
Reputation: 11
I assume you're using Google Cloud, so you need to make sure the following:
retriever = GoogleVertexAISearchRetriever(
filter="language: ANY(\"English\")",
project_id="123",
...
)
You can check this link for more information about filter expressions.
Upvotes: 1