arijeet
arijeet

Reputation: 1874

ElasticSearch scoring issue

I'm trying to figure out the logic that is being used by ElasticSearch in ranking the results by score.

I have a total of 4 indexes. I'm querying all the indexes for a term. The query that I'm using is as follows-

GET /_all/static/_search
{
  "query": {
    "match": {
      "name": "chinese"
    }
  }
}

The (partial) response that I get is as follows-

    {
   "took": 17,
   "timed_out": false,
   "_shards": {
      "total": 40,
      "successful": 40,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 2.96844,
      "hits": [
         {
            "_shard": 1,
            "_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
            "_index": "restaurant",
            "_type": "static",
            "_id": "XecLkyYNQWihuR2atFc5JQ",
            "_score": 2.96844,
            "_source": {
               "name": "Just Chinese"
            },
            "_explanation": {
               "value": 2.96844,
               "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:",
               "details": [
                  {
                     "value": 2.96844,
                     "description": "fieldWeight in 1, product of:",
                     "details": [
                        {
                           "value": 1,
                           "description": "tf(freq=1.0), with freq of:",
                           "details": [
                              {
                                 "value": 1,
                                 "description": "termFreq=1.0"
                              }
                           ]
                        },
                        {
                           "value": 4.749504,
                           "description": "idf(docFreq=3, maxDocs=170)"
                        },
                        {
                           "value": 0.625,
                           "description": "fieldNorm(doc=1)"
                        }
                     ]
                  }
               ]
            }
         },
         {
            "_shard": 1,
            "_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
            "_index": "restaurant",
            "_type": "static",
            "_id": "IAUpkC55ReySjvl9Xr5MVw",
            "_score": 2.96844,
            "_source": {
               "name": "The Chinese Hut"
            },
            "_explanation": {
               "value": 2.96844,
               "description": "weight(name:chinese in 5) [PerFieldSimilarity], result of:",
               "details": [
                  {
                     "value": 2.96844,
                     "description": "fieldWeight in 5, product of:",
                     "details": [
                        {
                           "value": 1,
                           "description": "tf(freq=1.0), with freq of:",
                           "details": [
                              {
                                 "value": 1,
                                 "description": "termFreq=1.0"
                              }
                           ]
                        },
                        {
                           "value": 4.749504,
                           "description": "idf(docFreq=3, maxDocs=170)"
                        },
                        {
                           "value": 0.625,
                           "description": "fieldNorm(doc=5)"
                        }
                     ]
                  }
               ]
            }
         },
         {
            "_shard": 2,
            "_node": "Hz9L2DZ-ShSajaNvoyU8Eg",
            "_index": "cuisine",
            "_type": "static",
            "_id": "6",
            "_score": 2.7047482,
            "_source": {
               "name": "Chinese"
            },
            "_explanation": {
               "value": 2.7047482,
               "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:",
               "details": [
                  {
                     "value": 2.7047482,
                     "description": "fieldWeight in 1, product of:",
                     "details": [
                        {
                           "value": 1,
                           "description": "tf(freq=1.0), with freq of:",
                           "details": [
                              {
                                 "value": 1,
                                 "description": "termFreq=1.0"
                              }
                           ]
                        },
                        {
                           "value": 2.7047482,
                           "description": "idf(docFreq=1, maxDocs=11)"
                        },
                        {
                           "value": 1,
                           "description": "fieldNorm(doc=1)"
                        }
                     ]
                  }
               ]
            }
         },

My question is- I am given to understand that elasticsearch treats smaller values with a higher score, then why are results like "Just Chinese" and "The Chinese Hut" from the restaurant index ranked above the expected best match "chinese" from the cuisine index? As far as I know, while inserting these documents into the index, I did not use any special analysers or anything. Everything is default.

What am I missing and how can I get the expected result?

Upvotes: 1

Views: 378

Answers (1)

imotov
imotov

Reputation: 30163

One of the important parameters in calculating scores is inverse document frequency (IDF). By default, each shard of elasticsearch tries to estimate global IDF based on the local IDF. It works when you have a lot of similar records that are evenly distributed across shards. However, when you have only a few records or when you are combining results from multiple shards with very different types of records (names of cuisine and names of restaurant) estimated IDF might produce strange results. The solution for this issue is to use dfs_query_then_fetch search mode of elasticsearch.

By the way, in order to understand how elasticsearch calculated the score, you can use explain parameter in your search request or on url. So, when you ask questions about scoring, it helps when you provide the output with explain set to true.

Upvotes: 3

Related Questions