Reputation: 896
I am trying to get the scores of the documents retrieved when using langchain retrievers. Below is a snippet from my implementation of the retriever without scores currently.
vectorstore = FAISS.from_documents(docs, embeddings_model)
semantic_retriever = vectorstore.as_retriever(k = 4)
result = semantic_retriever.invoke(query)
So I looked it up and langchain themselves have defined a way to do the same by writing a custom retriever wrapper and adding the similarity score to the metadata of the document. Here https://python.langchain.com/v0.2/docs/how_to/add_scores_retriever/
Honestly that approach may work but I am not sure if it makes sense. It doesn't to me. Why should a score become a part of the permanent metadata of the document. Also what's the difference between invoke and similarity_search_with_score? This is langchain 0.2 by the way.
Also how to get similarity scores for BM25 retriever, ensemble retriever coming from from langchain.retrievers import EnsembleRetriever, BM25Retriever
Upvotes: 1
Views: 1018
Reputation: 896
This is how I made it later, I didn't add score to the metadata but did still have to go up to the vectorstore level to get scores retrieved along with the documents.
from langchain_core.vectorstores import VectorStoreRetriever
class CustomSemanticRetriever(VectorStoreRetriever):
def invoke(
self, input: str, config: Optional[RunnableConfig] = None, **kwargs: Any
) -> List[Document]:
if len(input) > max_embedding_length:
input = input[:max_embedding_length]
return self.vectorstore.similarity_search_with_relevance_scores(
input, k=self.search_kwargs["k"]
)
Upvotes: 0