handle variable data during similarity search

Question

I have converted category data ["departure station","arrival station","flight number"] into embeddings using Sentence Transformer and using all-MiniLM-L6-v2 model for RAG pipeline. Transformer converts my user query to embedding and do cosine similarity search on my category data. I'm having aviation data hence user query can have multiple variable. for e.g. user write "tell me current status of 2215" instead writing "tell me departure and arrival station of flight number 2215".

2nd sentence gives me similarity score of 0.7 on my categories like "departure station","arrival station","flight number" while 1st sentence doesn't points to these categories hence gives score ~0.2.

How can i handle variable data like "2215" in embedding similarity search? or should i use some other technique to solve this problem?

handle variable data during similarity search

Answers (0)

Related Questions