Reputation: 2050
When sorting on a string field with multiple words, Elasticsearch is splitting the string value and using the min or max as the sort value. I.E.: when sorting on a field with the value "Eye of the Tiger" in ascending order, the sort value is: "Eye" and when sorting in descending order the value is: "Tiger".
Lets say I have "Eye of the Tiger" and "Wheel of Death" as entries in my index, when I do an ascending sort on this field, I would expect, "Eye of the Tiger" to be first, since "E" comes before "W", but what I'm seeing when sorting on this field, "Wheel of Death" is coming up first, since "D" is the min value of that term and "E" is the min value of "Eye of the Tiger".
Does anyone know how to turn off this behavior and just allow a regular sort on this string field?
Upvotes: 13
Views: 9531
Reputation: 6104
If you want the sorting to be case-insensitive "index": "not_analyzed"
doesn't work, so I've created a custom sort analyzer.
index-settings.yml
index :
analysis :
analyzer :
sort :
type : custom
tokenizer : keyword
filter : [lowercase]
Mapping:
...
"articleName": {
"type": "string",
"analyzer": "standard",
"fields": {
"sort": {
"type": "string",
"analyzer": "sort"
}
}
}
...
Upvotes: 4
Reputation: 321
As mconlin mentioned if you want to sort on the unanalyzed doc field you need to specify "index": "not_analyzed" to sort as you described. But if you're looking to be able to keep this field tokenized to search on, this post by sloan shows a great example. Using multi-field to keep two different mappings for a field is very common in Elasticsearch.
Hope this helps, let me know if I can offer more explanation.
Upvotes: 10