How to Resolve Duplicate Vector Matches in Redis Vector Store Using Langchain Framework?

Question

I'm currently using Redis Vector Store in conjunction with the Langchain framework. My application is configured to retrieve four distinct chunks, but I've noticed that sometimes all four chunks are identical. This is causing some inefficiencies and isn't the expected behavior. Does anyone know why this might be happening and have any recommendations on how to resolve it?

def getVectorStore(database: str, index_name: str = "KU_RULE_05") -> Redis:
    if database not in vectorstore:
        raise ValueError(f"{database} does not exist in vectorstore list in utils.py")

    if database == "Redis":
        VectorStore = Redis.from_existing_index(
            embedding=embedding(),
            redis_url=os.getenv("REDIS_URL"),
            index_name=index_name)

    return VectorStore

def getRelatedDocs(content: str, database="Redis"):
    VectorStore = getVectorStore(database=database, index_name=index_name)
    RelatedDocs = []

    for index, documents in enumerate(VectorStore.similarity_search(query=content)):
        RelatedDocs.append("{}: {}".format(index + 1, documents.page_content))
    return RelatedDocs

We've thoroughly checked for any duplicate documents in the database to see if that could be the cause of the issue, but we found no duplicates.

How to Resolve Duplicate Vector Matches in Redis Vector Store Using Langchain Framework?

Answers (1)

Related Questions