Reputation: 20372
I understand there are many distance measures to calculate the distance between two vectors (embeddings). However which one is the best when comparing two vectors for semantic similarity that have been generated using the sentence transformers library? or is there no consensus on this topic?
e.g., this link uses cosine similarity.
Upvotes: 1
Views: 2016
Reputation: 31
Different embedding models may be optimized for different downstream tasks or use-cases. You should always check whether the model is optimized for a dot product score, cosine similarity or simply L2 distance.
Note: I have seen people using cosine similarity score for embeddings extracted from a model optimized for dot product.
Also pre-trained models on sbert.net generally mention the dataset and score heuristic they are trained/fine-tuned for.
Hope this helps!
Upvotes: 1