Reputation: 350
Do any of the langchain retrievers provide filter arguments?
I'm trying to create an EnsembleFilter using a VectorRetriever (FAISS) and a normal Retriever (BM25), but the filter fails when combining them:
documents = [Document(page_content='The Celtics are my favourite team.', metadata={topic="sport"}),
Document(page_content='The Boston Celtics won the game by 20 points', metadata={topic="sport"}),
Document(page_content='This is just a random text.', metadata={topic="unknown"})]
# embeddings is any langchain embeddings
db = FAISS.from_documents(documents, embeddings)
question = "Who is my favourite team?"
retriever = BM25Retriever.from_documents(documents)
faiss_retriever = db.as_retriever(search_kwargs={'filter': dict(topic="sport"), 'k': 4, 'fetch_k': 8})
er = EnsembleRetriever(retrievers=[retriever, faiss_retriever], weights=[0.3, 0.7])
results = er.get_relevant_documents(question)
How can I make sure the filter persists in the BM25 retriever?
Upvotes: 5
Views: 6588
Reputation: 179
You can use a custom retriever to implement the filter.
documents = [Document(page_content='The Celtics are my favourite team.', metadata=dict(topic="sport")),
Document(page_content='The Boston Celtics won the game by 20 points', metadata=dict(topic="sport")),
Document(page_content='This is just a random text.', metadata=dict(topic="unknown"))]
db = FAISS.from_documents(documents, embeddings)
question = "Who is my favourite team?"
retriever = BM25Retriever.from_documents(documents)
class CustomRetriever(BaseRetriever):
def _get_relevant_documents(self, query: str, *, run_manager:
CallbackManagerForRetrieverRun) -> List[Document]:
return db.similarity_search(query, k = 4, filter =
dict(topic="sport"))
faiss_retriever = CustomRetriever()
er = EnsembleRetriever(retrievers=[retriever, faiss_retriever], weights=[0.3, 0.7])
results = er.get_relevant_documents(question)
Upvotes: 3