morpheus
morpheus

Reputation: 20372

What is the best distance measure to use when doing semantic search on the embeddings generated by sentence transformers?

I understand there are many distance measures to calculate the distance between two vectors (embeddings). However which one is the best when comparing two vectors for semantic similarity that have been generated using the sentence transformers library? or is there no consensus on this topic?

e.g., this link uses cosine similarity.

Upvotes: 1

Views: 2016

Answers (1)

Anubhav Chhabra
Anubhav Chhabra

Reputation: 31

Different embedding models may be optimized for different downstream tasks or use-cases. You should always check whether the model is optimized for a dot product score, cosine similarity or simply L2 distance.

Note: I have seen people using cosine similarity score for embeddings extracted from a model optimized for dot product.

Also pre-trained models on sbert.net generally mention the dataset and score heuristic they are trained/fine-tuned for.

Hope this helps!

Upvotes: 1

Related Questions