Reputation: 635
In Lucene's practical scoring function there is a query coordinator which punishes documents that fail to match all the query terms. does Okapi BM25 use the same trick?
The reason I'm curious about it is that I'm using Elasticsearch with BM25 similarity module and sometimes I feel this algorithm does not favor documents with more matches. There are cases that a document contains one or two terms a lot, outscores a document containing all query terms.
Upvotes: 1
Views: 329
Reputation: 33351
Yes and no.
No, it doesn't use a coord factor as described by the old Lucene default similarity (note: Lucene core now uses BM25 by default, as well).
Yes, it does weigh hits on more of the query terms more heavily than a bunch of hits on the same term. It does this with better term saturation, making the old coord factor effectively obsolete.
It is, however, always possible that many hits on less terms will outscore few hits on more terms using either algorithm.
Upvotes: 4