Scoring difference between seemingly equivalent Solr queries

Question

As I understand Solr's scoring function, the following two queries should be equivalent.

Namely, score(q1, d) = score(q2, d) for each docuement d in the corpus.

Query 1: evolution OR selection OR germline OR dna OR rna OR mitochondria

Query 2: (evolution OR selection OR germline) OR (dna OR rna OR mitochondria)

The queries are obviously logically equivalent (they both return the same set of documents). Also, both queries consist of the same 6 terms, and each term has a boost of 1 in both queries. Hence each term is supposed to have the same contribution to the total score (same TF, same IDF, same boost).

In spite of that, the queries don't give the same scores.

In general, a conjunction of terms (a OR b OR c OR d) is not the same as a conjunction of queries ((a OR b) OR (c OR d)). What is the semantic difference between the two types of queries? What is causing them to result in different scorings?

The reason I'm asking is that I'm building a custom request handler in which I construct the second type of query (conjunction of queries) while I might actually need to construct the first type of query (conjunction of terms). In other words, this is what I'm doing:

Query q1 = ... //conjunction of terms evolution, selection, germline
Query q2 = ... //conjunction of terms dna, rna, mitochondria
Query conjunctionOfQueries = new BooleanQuery();
conjunctionOfQueries.add(q1, BooleanClause.Occure.SHOULD);
conjunctionOfQueries.add(q2, BooleanClause.Occure.SHOULD);

while maybe I should actually do:

List terms = ... //extract all 6 terms from q1 and q2
List termQueries = ... //create a new TermQuery from each term in terms
Query conjunctionOfTerms = new BooleanQuery();
for (TermQuery t : termQueries) {
    conjunctionOfTerms.add(t, BooleanClause.Occure.SHOULD);
}

snakile · Accepted Answer

I've followed femtoRgon's advice to check the debug element of the score calculation. What I've found is that the calculations are indeed mathematically equivalent. The only difference is that in the conjunction-of-queries calculation we store intermediate results. More precisely, we store the contribution to the sum of each sub-query in a variable. Apparently, stopping in order to store intermediate results has an effect of accumulating a numerical error: Each time we store the intermediate result we're losing some accuracy. Since the actual queries in the application are quite big (not like the trivial example query), there's plenty of accuracy to be lost, and the accumulated error sometimes even changes the ranking order of the returned documents.

So the conjunction-of-terms query is expected to give a slightly better ranking than the conjunction-of-queries query, because the conjunction-of-queries query accumulates a greater numerical error.

Scoring difference between seemingly equivalent Solr queries

Answers (1)

Related Questions