Reputation: 111
I looking for some elegant way to sort my results first by alphabet and then by numbers.
My current solution is inserting an "~" before numbers using the next sort script, "~" is lexicographically after "z":
"sort": {
"_script":{
"script" : "s = doc['name.raw'].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s.split(' ')[0][0]); if (n != null) { '~' + s } else { s }",
"type" : "string"
}
}
but I wonder if there is a more elegant and perhaps more performant solution.
Input:
ZBA ABC ...
ABC SDK ...
123 RIU ...
12B BTE ...
11J TRE ...
BCA 642 ...
Desired output:
ABC SDK ...
BCA 642 ...
ZBA ABC ...
11J TRE ...
12B BTE ...
123 RIU ...
Upvotes: 4
Views: 2569
Reputation: 217254
You can do the same thing at indexing time using a custom analyzer which leverages a pattern_replace
character filter. It's more performant to do it at indexing than running a script sort at search time for each query.
It works in the same vein as your solution, i.e. if we detect a number, we prepend the value with a tilde ~
, otherwise we don't do anything, yet we do it at indexing time and index the resulting value in the name.sort
field.
PUT /tests
{
"settings": {
"analysis": {
"char_filter": {
"pre_num": {
"type": "pattern_replace",
"pattern": "(\\d)",
"replacement": "~$1"
}
},
"analyzer": {
"number_tagger": {
"type": "custom",
"tokenizer": "keyword",
"filter": [],
"char_filter": [
"pre_num"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"fields": {
"sort": {
"type": "string",
"analyzer": "number_tagger",
"search_analyzer": "standard"
}
}
}
}
}
}
}
Then you can index your data
POST /tests/test/_bulk
{"index": {}}
{"name": "ZBA ABC"}
{"index": {}}
{"name": "ABC SDK"}
{"index": {}}
{"name": "123 RIU"}
{"index": {}}
{"name": "12B BTE"}
{"index": {}}
{"name": "11J TRE"}
{"index": {}}
{"name": "BCA 642"}
Then your query can simply look like this:
POST /tests/_search
{
"sort": {
"name.sort": "asc"
}
}
And the response you'll get is:
{
"hits": {
"hits": [
{
"_source": {
"name": "ABC SDK"
}
},
{
"_source": {
"name": "BCA 642"
}
},
{
"_source": {
"name": "ZBA ABC"
}
},
{
"_source": {
"name": "11J TRE"
}
},
{
"_source": {
"name": "12B BTE"
}
},
{
"_source": {
"name": "123 RIU"
}
}
]
}
}
Upvotes: 3