Reputation: 63
I'm trying to solve a performance issue we have when querying ElasticSearch for several thousand results. The basic idea is that we do some post-query processing and only show the Top X results ( Query may have ~100000 Results while we only need the top 100 according to our Score Mechanics ).
The basic mechanics are as follows: ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.
What I'd like to do is move this logic into ElasticSearch using custom scoring ( or well, anything that works ): https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-script-score
The Problem I'm facing is that using Score Scripts / Score Functions I can't seem to find a way to do something like max(_score) to normalize the score between 0 and 1.
"script_score" : {
"script" : "(_score / max(_score) + doc['some_normalized_field'].value)/2"
}
Any ideas are welcome.
Upvotes: 6
Views: 18336
Reputation: 625
You can not get max_score before you have actually generated the _score for all the matching documents. script_score query will first generate the _score for all the matching documents and then max_score will be displayed by elasticsearch.
According to what i can understand from your problem, You want to preserve the max_score that was generated by the original query, before you applied "script_score". You can get the required result if you do some computation at the front-end. In short apply your formula at the front end and then sort the results.
you can save your factor inside your results using script_fields query.
{
"explain": true,
"query": {
"match_all": {}
},
"script_fields": {
"total_goals": {
"script": {
"lang": "painless",
"source": """
int total = 0;
for (int i = 0; i < doc['goals'].length; ++i) {
total += doc['goals'][i];
}
return total;
""",
"params":{
"last" : "any parameters required"
}
}
}
}
}
Upvotes: 3
Reputation: 1310
Based on this github ticket it is simply impossible to normalize score and they suggest to use boolean similarity as a workaround.
Upvotes: 0
Reputation: 714
I am not sure that I understand your question. do you want to limit the amount of results?
are you tried?
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "name" : "dennis" }
}
}
you can use sort to define sort order by default it will sorted by main query.
you can also use aggregations ( with or without function_score )
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"date": {
"scale": "3d",
"offset": "7d",
"decay": 0.1
}
}
},
{
"gauss": {
"priority": {
"origin": "0",
"scale": "100"
}
}
}
],
"query": {
"match" : { "body" : "dennis" }
}
}
},
"aggs": {
"hits": {
"top_hits": {
"size": 10
}
}
}
}
Upvotes: 0