Reputation: 959
I created a case insensitive analyzer as
PUT /dhruv3
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "keyword",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
},
"mappings": {
"test": {
"properties": {
"about": {
"type": "string",
"analyzer": "analyzer_keyword"
},
"firsName": {
"type": "string"
}
}
}
}
}
and used it in mapping. About field is supposed to contain aplha numerc and special characters.Then I inserted some values with about
field as
1234, `pal, pal, ~pal
. Besides searching I need to get result sorted. Searching is working well but when I try to sort them as
GET dhruv/test/_search
{
"sort": [
{
"about": {
"order": "asc"
}
}
]
}
I get results in about field as
1234,`pal,pal,~pal
. But I expect them to be as first special characters, then numbers and then alphabets.
I did some home work and came to know that its because of their ASCII values. SO i searched internet and tried even asciifolding
. But didn't work out. I know there is some solution some where but I can't figure out. Please guide me
Upvotes: 0
Views: 2896
Reputation: 4537
The asciifolding
has nothing to do with what you're trying to achieve. The ASCIIFoldingFilter.java has a wealth of information, it merely decodes unicode chars like \uFF5E
to its ASCII equivalent in case if one can be provided as the alternative.
Adding to @Val's answer, in case you want the values sorted in the order of special chars then numbers then alphabets, you may want to consider using -
GET /ascii/test/_search
{
"sort": {
"_script": {
"script": "r = doc['about'].value.chars[0]; return !r.isLetter() ? r.isDigit() ? 1 : -1 : 2",
"type": "number",
"order": "asc"
}
}
}
Also, note this sorting may not be perfect since we only took care of first char in the script. You may want to write a robust script that takes care of entire value.
This gist is a good example of what you can achieve using embedded scripts.
Upvotes: 1
Reputation: 217594
You're right in that the sorting behavior you are seeing is due to the ASCII value of the special characters to be bigger than the ASCII value of digits. To be precise, looking at the ASCII table, we have the following values:
1
has the ASCII value 49p
has the ASCII value 112~
has the ASCII value 126The asciifolding
token filter simply transforms characters and digits which are NOT in the ASCII table (i.e. first 127 characters) into their ASCII equivalent, if such one exists (e.g. é
, è
, ë
, ê
are transformed to e
). Since all the characters above are in the ASCII table, this is not what you're looking for.
If you want the special characters to come up first in the search there are several ways.
One way to achieve it is simply to negate their ASCII value so that they will always come before the ASCII 0 character and then use script sorting:
{
"sort": [
{
"_script": {
"script": "return doc['about'].value.chars[0].isLetterOrDigit() ? 1 : -1",
"type": "number",
"order": "asc"
}
}
]
}
Upvotes: 4