How to customize score calculate in ElasticSearch?

Question

I have the following request.

    {'bool':
         {'must': [
              {"terms": {"state.keyword": ["Alaska", "Alabama"]}
          ],
          'should': [
              {'match': {'abstract': 'Spill and Overfill Prevention 18 AAC 78.045'}},
              {'match': {'title': 'Spill and Overfill Prevention 18 AAC 78.045'}},
              {'constant_score': {
                  'filter': {
                      'match': {'title': 'Spill and Overfill Prevention 18 AAC 78.045'}
                  }
              }}
          ]}
     }

Need to calculate the score by title (match).

For this I tried using constant_score.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html

However, it did not give the expected effect. It just increments the result of each element by exactly 1.

Here is the result of analyze

{'took': 21, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 6, 'relation': 'eq'}, 'max_score'
: 4.754379, 'hits': [{'_index': 'articles', '_type': '_doc', '_id': '483703', '_score': 4.754379, '_source':

Here is the explain result

{'_index': 'articles', '_type': '_doc', '_id': '483703', 'matched': True, 'explanation': {'value': 6.6602507, 'description': 'sum of:', 'details': [{'value': 0.150
05009, 'description': 'weight(legal_language:and in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.15005009, 'description': 'score(freq=14.0), compu
ted as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N
, total number of documents with field', 'details': []}]}, {'value': 0.92034066, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from
:', 'details': [{'value': 14.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation para
meter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (a
pproximate)', 'details': []}, {'value': 497.5, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.3779109, 'description': 'weight(l
egal_language:18 in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.3779109, 'description': 'score(freq=3.0), computed as boost * idf * tf from:', 'd
etails': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.24116206, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:',
'details': [{'value': 5, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with fi
eld', 'details': []}]}, {'value': 0.7122915, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 3.0, 'desc
ription': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.7
5, 'description': 'b, length normalization parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (approximate)', 'details': []}, {'value
': 497.5, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.3779109, 'description': 'weight(legal_language:aac in 2) [PerFieldSimi
larity], result of:', 'details': [{'value': 0.3779109, 'description': 'score(freq=3.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'descriptio
n': 'boost', 'details': []}, {'value': 0.24116206, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 5, 'descriptio
n': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.
7122915, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 3.0, 'description': 'freq, occurrences of term
within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normali
zation parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (approximate)', 'details': []}, {'value': 497.5, 'description': 'avgdl, ave
rage length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:spill in 2) [PerFieldSimilarity], result of:', 'details': [{'value':
1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 1.02
96195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 2, 'description': 'n, number of documents containing term'
, 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as fr
eq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value'
: 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value'
: 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value':
0.072622515, 'description': 'weight(title:and in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0), computed
as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 + (N
- n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, t
otal number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:',
'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation paramete
r', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'detai
ls': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:overfill in
2) [PerFieldSimilarity], result of:', 'details': [{'value': 1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value':
2.2, 'description': 'boost', 'details': []}, {'value': 1.0296195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value'
: 2, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}
]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, oc
currences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': '
b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl
, average length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:prevention in 2) [PerFieldSimilarity], result of:', 'details':
[{'value': 1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'va
lue': 1.0296195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 2, 'description': 'n, number of documents contai
ning term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, comp
uted as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}
, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}
, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]},
{'value': 0.072622515, 'description': 'weight(title:18 in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0),
computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as lo
g(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'descriptio
n': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)
) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation
parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field
', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.072622515, 'description': 'weight(title:
aac in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{
'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details':
[{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'det
ails': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description':
'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'descr
iption': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'descriptio
n': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 1.5095675, 'description': 'weight(title:78.045 in 2) [PerFieldSimilarity], result of:', 'deta
ils': [{'value': 1.5095675, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}
, {'value': 1.5404451, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 1, 'description': 'n, number of documents
containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf
, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details
': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details
': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}
]}]}, {'value': 1.0, 'description': 'ConstantScore(title.keyword:Spill and Overfill Prevention 18 AAC 78.045)', 'details': []}]}}

With script_score

{'query': {
    'function_score': {
        'query': {
            'bool': {
                'should': [ 
                    {'match': {'legal_language': 'inspections and testing 691'}},
                    {'match': {'title': 'inspections and testing 691'}}
                ]
            }
        },
        'script_score': {
            'script': {'source': "doc['title'].value"}
        }
    }
}}

Mapping

{
  "articles" : {
    "mappings" : {
      "properties" : {
        "abstract" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "categories" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cfr40_part280" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "citation" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "effective_date" : {
          "type" : "date"
        },
        "id" : {
          "type" : "long"
        },
        "legal_language" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "local_regulation" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "reference_images" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "tags" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "unique_id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Traceback

> Traceback (most recent call last):   File
> "D:\work_projects\dewey_project\webapp\articles\services\elasticsearch_service.py",
> line 103, in retrieve_articles
>     result = current_app.elasticsearch.search(   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\client\utils.py",
> line 84, in _wrapped
>     return func(*args, params=params, **kwargs)   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\client\__init__.py",
> line 1547, in search
>     return self.transport.perform_request(   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch	ransport.py",
> line 351, in perform_request
>     status, headers_response, data = connection.perform_request(   File
> "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\connection\http_urllib3.py",
> line 261, in perform_request
>     self._raise_error(response.status, raw_data)   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\connection\base.py",
> line 181, in _raise_error
>     raise HTTP_EXCEPTIONS.get(status_code, TransportError)( elasticsearch.exceptions.RequestError: RequestError(400,
> 'search_phase_execution_exception', 'runtime error')

Anton Lashin · Accepted Answer

It's not quite clear what you are trying to achieve, but it looks like you want to receive TF/IDF score for a document based on a match only on the title field. And also you want to put additional constraints to your query. If so, you should use the filter clauses for the bool query. They will not modify your score but will filter the results based on the match in them.

{
    "bool": {
        "should": [
            {"match": {"title": "Spill and Overfill Prevention 18 AAC 78.045"}}
        ],
        "filter": [
            {"match": {"abstract": "Spill and Overfill Prevention 18 AAC 78.045"}},
            {"terms": {"state.keyword": ["Alaska", "Alabama"]}
        ]
    }
}

This will return a bit different results than your original query because it will require a match of the abstract field on query Spill and Overfill Prevention 18 AAC 78.045. If you want to keep the behavior of the original query then you should move it as a constant score query inside the should block

{
    "query": {
        "bool": {
            "should": [
                {"match": {"title": "Spill and Overfill Prevention 18 AAC 78.045"}},
                {"constant_score": {
                    "filter": {
                        "match": {"legal_language": "Spill and Overfill Prevention 18 AAC 78.045"}
                    }
                }}
            ],
            "filter": [
                {"terms": {"state.keyword": ["Alaska", "Alabama"]]}},
            ],
        }
    }
}

and then subtract 1 from the result score.

How to customize score calculate in ElasticSearch?

Answers (2)

Related Questions