Why Elastic search is returning wrong relevance score?

Question

I am learning elastic search, I inserted the following data in the megacorp index having the type employee:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.6931472,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

Then I ran the following request:

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

However the result I got is as follows:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.6682933,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.6682933,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

I have the doubt that relevance score for the following record:

{
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }

is lesser than the previous one. I ran the query with

explain: true

and got the following result:

        {
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.6682933,
    "hits" : [
      {
        "_shard" : "[megacorp][2]",
        "_node" : "pGtCz_FvSTmteJwQKvn_lg",
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.6682933,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ],
          "fielddata" : true
        },
        "_explanation" : {
          "value" : 0.6682933,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.6682933,
              "description" : "weight(about:rock in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.6682933,
                  "description" : "score(doc=0,freq=1.0 = termFreq=1.0
), product of:",
                  "details" : [
                    {
                      "value" : 0.6931472,
                      "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "docFreq",
                          "details" : [ ]
                        },
                        {
                          "value" : 2.0,
                          "description" : "docCount",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.96414346,
                      "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "termFreq=1.0",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "parameter k1",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "parameter b",
                          "details" : [ ]
                        },
                        {
                          "value" : 5.5,
                          "description" : "avgFieldLength",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "fieldLength",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[megacorp][3]",
        "_node" : "pGtCz_FvSTmteJwQKvn_lg",
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ],
          "fielddata" : true
        },
        "_explanation" : {
          "value" : 0.5753642,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 0.2876821,
              "description" : "weight(about:rock in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.2876821,
                  "description" : "score(doc=0,freq=1.0 = termFreq=1.0
), product of:",
                  "details" : [
                    {
                      "value" : 0.2876821,
                      "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "docFreq",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.0,
                          "description" : "docCount",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "termFreq=1.0",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "parameter k1",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "parameter b",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "avgFieldLength",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "fieldLength",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 0.2876821,
              "description" : "weight(about:climbing in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.2876821,
                  "description" : "score(doc=0,freq=1.0 = termFreq=1.0
), product of:",
                  "details" : [
                    {
                      "value" : 0.2876821,
                      "description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "docFreq",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.0,
                          "description" : "docCount",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "termFreq=1.0",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "parameter k1",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "parameter b",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "avgFieldLength",
                          "details" : [ ]
                        },
                        {
                          "value" : 6.0,
                          "description" : "fieldLength",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Can you please tell me what is the reason behind this?

Piotr Pradzynski · Accepted Answer

Short answer: Relevance in Elasticsearch is not a simple topic :) Details below.

I was trying to reproduce your case...

First I've put the two documents:

POST /megacorp/employee/1
{
  "first_name": "John",
  "last_name": "Smith",
  "age": 25,
  "about": "I love to go rock climbing",
  "interests": [
    "sports",
    "music"
  ]
}

POST /megacorp/employee/2
{
  "first_name": "Jane",
  "last_name": "Smith",
  "age": 32,
  "about": "I like to collect rock albums",
  "interests": [
    "music"
  ]
}

and later I used your query:

GET /megacorp/employee/_search
{
  "query": {
    "match": {
      "about": "rock climbing"
    }
  }
}

My results were totally different:

{
  "took": 89,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      }
    ]
  }
}

As you can see results are in "expected" order. Please note that the _score values are totally different than you.

The question is: Why? What happened?

The detailed answer for this situation was described in the Practical BM25 - Part 1: How Shards Affect Relevance Scoring in Elasticsearch article.

Shortly: as you probably could notice Elasticsearch stores documents split among shards. To be faster, by default it uses query_then_fetch strategy. This means that Elasticsearch first asks for results on every shard and later will fetch the results and present them to the user. Of course the same happens with the score calculation.

As you can see, in our results 5 shards where queried. Elasticsearch is using 5 shards by default if not specified on index creation (can be specified with number_of_shards param). That is why our scores are different. Moreover, if you try to do this again yourself there is a big chance that you get different results once again. Everything depends on how the document is distributed among shards. If you set number_of_shards to 1 for this index you will be getting the same scores each time.

An additional thing, also mentioned in the article is:

People start loading just a few documents into their index and ask “why does document A have a higher/lower score than document B” and sometimes the answer is that the user has a relatively high ratio of shards to documents so that the scores are skewed across different shards.

Elasticsearch was designed to maintain a large amount of data and the more data you put into an index, the more accurate the results you get.

I hope my answer explains your doubts.

Why Elastic search is returning wrong relevance score?

Answers (1)

Related Questions