Michael Mincone
Michael Mincone

Reputation: 43

Elasticsearch score exact matches equally regardless of field length

I have a simple index with one field which is an array of specialties. One record is

"specialties" : ["hand"]

and the other record is

"specialties" : ["hand","foot","eye"]

when i do the following query

GET test/_search/
{
  "query": {
    "term": {
      "specialties": {
        "value": "hand",
        "boost": 1.0
      }
    }
  }
}

i get

"hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "XRrLTHUBX_caCRKF8MQI",
        "_score" : 0.22920427,
        "_source" : {
          "specialties" : [
            "hand"
          ]
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "XBrLTHUBX_caCRKFAsSM",
        "_score" : 0.1513613,
        "_source" : {
          "specialties" : [
            "hand",
            "foot",
            "eye"
          ]
        }
      }
    ]

My question is: what can I do to get both of these records scored the same when performing a query for "hand"?

Upvotes: 3

Views: 1159

Answers (1)

Bhavya
Bhavya

Reputation: 16192

The score matches are not equal due to field normalization and to resolve your issue, you have to disable the norms on the field

Using the explain API, you will find that the score is not the same, for the match because of the dl parameter i.e the length of the field.

The dl parameter value is 3.0 for "specialties" : ["hand","foot","eye"] and for "specialties" : ["hand"] it is 1.0. Therefore, the tf score decreases for the second document (as dl is present in the denomiator of formula).

Due to which the final score (which is calculated by boost * idf * tf), also decreases for the second document.

{
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details": [
                    {
                      "value": 1.0,
                      "description": "freq, occurrences of term within document",
                      "details": []
                    },
                    {
                      "value": 1.2,
                      "description": "k1, term saturation parameter",
                      "details": []
                    },
                    {
                      "value": 0.75,
                      "description": "b, length normalization parameter",
                      "details": []
                    },
                    {
                      "value": 3.0,
                      "description": "dl, length of field",
                      "details": []
                    },
                    {
                      "value": 2.0,
                      "description": "avgdl, average length of field",
                      "details": []
                    }
                  ]
                }

If you want elasticsearch to score exact matches equally regardless of field length, then you need to disable norms on specialities field.

Adding a working example with index mapping, index data (used same as that given in question) search query, and search result

Index Mapping:

{
  "mappings": {
    "properties": {
      "specialties": {
        "type": "text",
        "norms": false
      }
    }
  }
}

Search Query:

{
  "query": {
    "term": {
      "specialties": {
        "value": "hand",
        "boost": 1.0
      }
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "64471434",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.22920428,
        "_source": {
          "specialties": [
            "hand",
            "foot",
            "eye"
          ]
        }
      },
      {
        "_index": "64471434",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.22920428,
        "_source": {
          "specialties": [
            "hand"
          ]
        }
      }
    ]

Update 1:

If the use case is just to match the result based on the condition (and scoring is not important), then you can go with a constant score query or use bool filter

{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "specialties": "hand" }
      }
    }
  }
}

Upvotes: 2

Related Questions