Reputation: 3270
I'm trying to query and compare two MLT queries scores but am a bit confused based on what I read here https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html
Even though the intent of the query norm is to make results from different queries comparable, it doesn’t work very well. The only purpose of the relevance _score is to sort the results of the current query in the correct order. You should not try to compare the relevance scores from different queries.
if I ran an MLT query and document 'A' is similar to document 'B' and the score is 0.4 and conversely, running the MLT query document 'B' is similar to document 'A' and its score is 2.4.
I would expect the score to be the same based on the tokens matched in the MLT, but that's not the case.
Also, if I ran an MLT query and document 'A' is similar to document 'B' and the score is 0.6 and running another MLT query document 'C' is similar to document 'A' and its score is 4.7.
So my questions are:
Thanks, - Phil
Upvotes: 3
Views: 1639
Reputation: 33351
1.
No, It doesn't. As you noted in your question, you should not compare the scores of different queries. If you want to get a meaningful result of which documents are most similar to C, you should generate an MLT query for document C, and search with that.
This is made doubly true due to how MLT queries work. MLT attempts to generate a list of interesting terms to search for from your document (based on the library of terms in the index), and searches for them. The set of terms generated from doc A may be much different than that generated from Document B, thus the wildly different scores when when finding A from B, and vice-versa, even though the documents themselves will obviously have the same overlap.
2.
Don't. Listen to the docs. Scores are only designed to rank how well documents match the query that generated them. Using them outside that context is not meaningful. Rethink what you are trying to accomplish.
Upvotes: 3