Solr - Why are scores of documents different although the query has not differentiated between them

Question

I have put the following queries below to get this response -

"response":{"numFound":200,"start":0,"maxScore":20.458012,"docs":[
      {
        "food_group":"Dairy",
        "carbs":"13.635",
        "protein":"2.625",
        "name":"Apple Milkshake",
        "fat":"3.814",
        "id":"109",
        "calories":99.0,
        "_version_":1565386306583789568,
        "score":20.458012},
      {
        "food_group":"Proteins",
        "carbs":"4.79",
        "protein":"4.574",
        "name":"Chettinad Egg Curry",
        "fat":"6.876",
        "id":"526",
        "calories":99.0,
        "_version_":1565386306489417728,
        "score":19.107327}
.....//other documents...
]}

Querys -

q = (food_group:"Proteins"  OR
food_group:"Dairy"  OR
food_group:"Grains")

bf = div(1,abs(sub(100,calories)))^15
bq = food_group:"Proteins" + food_group:"Dairy" + food_group:"Grains"

My question is that even though i have not provided any boost to "Dairy" with respect to "Proteins" in bq why is the "Dairy" document having higher score.

Persimmonium · Accepted Answer

because "Dairy" is a more rare term in your corpus. Lucene will give a higher score to a match with a term that is rare vs a match with a very common term.

If you want to get into the detials, look up how BM25 similarity is computed. BM25 is what Lucene (thus Solr) uses now by default, before it was TD-IDF, but they are very similar.

Solr - Why are scores of documents different although the query has not differentiated between them

Answers (1)

Related Questions