Reputation: 5503
I'm testing boost operator in lucene and found strange behaviour
Example
"red fox"
"red^1.2 fox"
When I test queries against text:
"wonderful red fox"
I got score for query2 lower than for query1. But I expect that query2 should win.
Below explanations for queries
Explain for query1
{0,4339554 = (MATCH) sum of:
0,2169777 = (MATCH) weight(content:fox in 0), product of:
0,7071068 = queryWeight(content:fox), product of:
0,3068528 = idf(docFreq=1, maxDocs=1)
2,304384 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
1 = tf(termFreq(content:fox)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
0,2169777 = (MATCH) weight(content:red in 0), product of:
0,7071068 = queryWeight(content:red), product of:
0,3068528 = idf(docFreq=1, maxDocs=1)
2,304384 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
1 = tf(termFreq(content:red)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
}
Explain for query2
{0,4313012 = (MATCH) sum of:
0,2396118 = (MATCH) weight(content:fox^1.25 in 0), product of:
0,7808688 = queryWeight(content:fox^1.25), product of:
1,25 = boost
0,3068528 = idf(docFreq=1, maxDocs=1)
2,035813 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
1 = tf(termFreq(content:fox)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
0,1916894 = (MATCH) weight(content:red in 0), product of:
0,6246951 = queryWeight(content:red), product of:
0,3068528 = idf(docFreq=1, maxDocs=1)
2,035813 = queryNorm
0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
1 = tf(termFreq(content:red)=1)
0,3068528 = idf(docFreq=1, maxDocs=1)
1 = fieldNorm(field=content, doc=0)
}
I wonder why boosted query has lower score than normal one?
Upvotes: 2
Views: 150
Reputation: 33351
This is due to the query norm. This feature of the scoring algorithm attempts to make scores roughly comparable from one query to the next.
This is calculated as:
queryNorm = 1 / sumOfSquaredWeights½
Where:
sumOfSquaredWeights = query boost2 · ∑ ( idf · term boost )2
If you remove that factor from the explanations, simply by dividing the final score by the query norm, you find that the second query does, indeed, get a higher score:
query1 --> .4339554 / 2.304384 = 0.1883
query2 --> .4313012 / 2.035813 = 0.2119
The larger point though: You shouldn't read too much into comparing the scores from one query to the next. Scores are only really relevant to the query that generated them. You can see in the explanations that the boosted term contributes a greater relative weight to the score, which is all boosts are really intended to do.
Upvotes: 1