Similarity search using Langchain Chroma not returning relevant results

Question

I am using Langchain chroma DB to store and retrieve data.

The data in the vector DB is in French and was stored using openAI Embeddings.

Le code suivant: 84431390 décrit: Machines et appareils à imprimer offset (sauf alimentés en feuilles ou en bobines)

The ultimate goal is to build a chat assistant. But for now I isolated an issue with the similarity search in chromaDB which performs poorly when I'm searching for a numerical code (as seen previously). for instance, if I give the following input query:

 code suivant : 84823000

I should normally obtain the record containing the code in question, however I get the following results :

'Le code suivant : 84864000 décrit: Machines et appareils visés à la note 11 C du chapitre 84'

'Le code suivant : 84483900 décrit: Parties et accessoires des machines du n° 8445, n.d.a.'

'Le code suivant : 84313900 décrit: Parties de machines et appareils du n° 8428, n.d.a.'

Is it hard for the similarity search to find relevant code, or is there something else that I am missing.

Similarity search using Langchain Chroma not returning relevant results

Answers (1)

Related Questions