tcotts
tcotts

Reputation: 350

Langchain - using filters in a Retriever

Do any of the langchain retrievers provide filter arguments?

I'm trying to create an EnsembleFilter using a VectorRetriever (FAISS) and a normal Retriever (BM25), but the filter fails when combining them:

documents = [Document(page_content='The Celtics are my favourite team.', metadata={topic="sport"}),
     Document(page_content='The Boston Celtics won the game by 20 points', metadata={topic="sport"}),
     Document(page_content='This is just a random text.', metadata={topic="unknown"})]

# embeddings is any langchain embeddings
db = FAISS.from_documents(documents, embeddings) 
question = "Who is my favourite team?"
retriever = BM25Retriever.from_documents(documents)
faiss_retriever = db.as_retriever(search_kwargs={'filter': dict(topic="sport"), 'k': 4, 'fetch_k': 8})
er = EnsembleRetriever(retrievers=[retriever, faiss_retriever], weights=[0.3, 0.7])
results = er.get_relevant_documents(question)

How can I make sure the filter persists in the BM25 retriever?

Upvotes: 5

Views: 6588

Answers (1)

Anastasia Vishnyakova
Anastasia Vishnyakova

Reputation: 179

You can use a custom retriever to implement the filter.

documents = [Document(page_content='The Celtics are my favourite team.', metadata=dict(topic="sport")),
 Document(page_content='The Boston Celtics won the game by 20 points', metadata=dict(topic="sport")),
 Document(page_content='This is just a random text.', metadata=dict(topic="unknown"))]

db = FAISS.from_documents(documents, embeddings) 
question = "Who is my favourite team?"
retriever = BM25Retriever.from_documents(documents)
class CustomRetriever(BaseRetriever):
    def _get_relevant_documents(self, query: str, *, run_manager: 
CallbackManagerForRetrieverRun) -> List[Document]:
        return db.similarity_search(query, k = 4, filter = 
        dict(topic="sport"))
faiss_retriever = CustomRetriever()

er = EnsembleRetriever(retrievers=[retriever, faiss_retriever], weights=[0.3, 0.7])
results = er.get_relevant_documents(question)

Upvotes: 3

Related Questions