Gherman
Gherman

Reputation: 7436

Why doesn't Sphinx have BM25 with field weights?

The formula for Sphinx default ranker, SPH_RANK_PROXIMITY_BM25 looks like this:

SPH_RANK_PROXIMITY_BM25 = sum(lcs*user_weight)*1000+bm25

The Longest Common Subsequence is computed for each field separately and then multiplied by user_weight. However bm25 is just a document-wide variable and does not take user fields into account. Why is that so?

Upvotes: 0

Views: 160

Answers (1)

Manticore Search
Manticore Search

Reputation: 1482

Just because it's faster and in many cases the quality is enough. There's a custom ranker and bm25f to be used there. Document length is also not accounted by default, it requires index_field_lengths=1 during indexing.

Upvotes: 1

Related Questions