Ashish Mishra
Ashish Mishra

Reputation: 155

Elasticsearch scoring Issue for same data

I have an Elastic Index with the below 4 documents.

PUT test/_doc/1
{
"tag" : "prove"
}

PUT test/_doc/2
{
"tag" : "prove"
}

PUT test/_doc/3
{
"tag" : "freckle"
}

PUT test/_doc/4
{
"tag" : "freckle"
}

On this i am running a simple query to pick the documents, with tag either prove or freckle. As one can infer all four will come in the results.

Query-

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tag": "prove freckle"
          }
        }
      ]
    }
  }
}

Result -

{
  "took" : 950,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.87546873,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.87546873,
        "_source" : {
          "tag" : "freckle"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.87546873,
        "_source" : {
          "tag" : "freckle"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.53899646,
        "_source" : {
          "tag" : "prove"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.53899646,
        "_source" : {
          "tag" : "prove"
        }
      }
    ]
  }
}

But I am unable to understand how elastic is giving different scores to docs, although all matched the query and all are in the same shard. No one has other fields, or all docs are equally unique, then how come such variation in result?? Why freckle tag doc are gaining more score than prove ?

Upvotes: 0

Views: 170

Answers (1)

Barkha Jain
Barkha Jain

Reputation: 768

Did you delete any documents from the index?

I get the same score for all 4 docs when I create the index and query for the first time. Then when I add 3 more docs and delete them I get results similar to yours in the question.

The reason behind this is because elasticsearch does not immediately delete the docs but marks them for deletion. These docs are JUST not searchable. Hence, the metrics used for score relevance includes metrics from deleted docs too.

You can check this out by using explain:true in your query. In the score calculation for idf you will see that the N(total number of docs with the field) is actually not the same as the number of docs in the index. In my case it was 7 when the total number of docs in the index was 4.

You can check the reason behind the deletion logic here.

Upvotes: 1

Related Questions