Reputation: 849
Suppose I've got multiple lucene indexes (not replicas) on several PC's.
I query each index and then merge the results. Is there any way to normalize the document scores so that I could sort by score (relevance)?
I mean, the scores for document A from index A would not be comparable with document B from index B, unless I do some sort of normalization.... not so?
Thanks Roey
Upvotes: 3
Views: 1794
Reputation: 20621
First, study the Lucene Similarity Documentation. Out of all the factors there, the only one that is different from one index to another is the inverse document frequency (idf).
I suggest you use Luke or a debugger to see the impact of the different indexes' idfs. You may find that this only has a minor influence.
Here is a discussion about using a global idf, and here - a Wiki page about distributed search design in Solr. I believe the problem is not yet solved.
The Lucene scoring does not lend itself to simple normalization. I suggest you try and make the document distribution as random as possible, and then compare how your hits from the two indexes rank.
Upvotes: 4
Reputation: 18918
for comparing the score of document A for indices X and Y. I compute x = score(A,X) / max score of any document that is a hit for search on index X
and y = score(A,Y) / max score of any document that is a hit for search on index Y
.
Both x and y are now between 0 and 1. just add x and y to get the final score.
this is a naive approach. would like to hear your comments on this.
but i don't understand why do you want to add scores of two different documents. Use Case?
Upvotes: -1