Reputation: 43
I have a simple index with one field which is an array of specialties. One record is
"specialties" : ["hand"]
and the other record is
"specialties" : ["hand","foot","eye"]
when i do the following query
GET test/_search/
{
"query": {
"term": {
"specialties": {
"value": "hand",
"boost": 1.0
}
}
}
}
i get
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "XRrLTHUBX_caCRKF8MQI",
"_score" : 0.22920427,
"_source" : {
"specialties" : [
"hand"
]
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "XBrLTHUBX_caCRKFAsSM",
"_score" : 0.1513613,
"_source" : {
"specialties" : [
"hand",
"foot",
"eye"
]
}
}
]
My question is: what can I do to get both of these records scored the same when performing a query for "hand"?
Upvotes: 3
Views: 1159
Reputation: 16192
The score matches are not equal due to field normalization and to resolve your issue, you have to disable the norms on the field
Using the explain API, you will find that the score is not the same, for the match because of the dl
parameter i.e the length of the field.
The dl
parameter value is 3.0
for "specialties" : ["hand","foot","eye"]
and for "specialties" : ["hand"]
it is 1.0
. Therefore, the tf
score decreases for the second document (as dl
is present in the denomiator of formula).
Due to which the final score (which is calculated by boost * idf * tf
), also decreases for the second document.
{
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1.0,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0.75,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 3.0,
"description": "dl, length of field",
"details": []
},
{
"value": 2.0,
"description": "avgdl, average length of field",
"details": []
}
]
}
If you want elasticsearch to score exact matches equally regardless of field length, then you need to disable norms on specialities
field.
Adding a working example with index mapping, index data (used same as that given in question) search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"specialties": {
"type": "text",
"norms": false
}
}
}
}
Search Query:
{
"query": {
"term": {
"specialties": {
"value": "hand",
"boost": 1.0
}
}
}
}
Search Result:
"hits": [
{
"_index": "64471434",
"_type": "_doc",
"_id": "2",
"_score": 0.22920428,
"_source": {
"specialties": [
"hand",
"foot",
"eye"
]
}
},
{
"_index": "64471434",
"_type": "_doc",
"_id": "1",
"_score": 0.22920428,
"_source": {
"specialties": [
"hand"
]
}
}
]
Update 1:
If the use case is just to match the result based on the condition (and scoring is not important), then you can go with a constant score query or use bool filter
{
"query": {
"constant_score": {
"filter": {
"term": { "specialties": "hand" }
}
}
}
}
Upvotes: 2