Reputation: 4774
Is there anyway I can guarantee that every document with all query terms always scores higher than documents with lesser query terms?
Note that I don't want to stick with AND semantics. I still want to show results if there isn't any document that match all query terms.
Upvotes: 2
Views: 193
Reputation: 3195
one (safe, fast) thing you can try is to subclass DefaultSimilarity and adjust the computation of the coordination factor. The default computation is a basic fraction (so e.g. a document that only matches 2 out of 3 terms still gets 2/3 of the coordination factor as one that matches all 3).
If this factor (matching all of the query terms) is important to you, then I suggest you explicitly boost documents that match all of the query terms even more, below is an example that cuts the score in half again for any document that doesn't match all the query terms.
For example:
@Override
public float coord(int overlap, int maxOverlap) {
return (overlap == maxOverlap)
? 1f
: 0.5f * super.coord(overlap, maxOverlap);
}
This factor is described in more detail here: Lucene Similarity javadocs
Upvotes: 3