dimartiro
dimartiro

Reputation: 111

Elasticsearch sort alphabetically then numerically

I looking for some elegant way to sort my results first by alphabet and then by numbers.

My current solution is inserting an "~" before numbers using the next sort script, "~" is lexicographically after "z":

"sort": {
  "_script":{
      "script" : "s = doc['name.raw'].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s.split(' ')[0][0]); if (n != null) { '~' + s } else { s }",
      "type" : "string"
  }
 }

but I wonder if there is a more elegant and perhaps more performant solution.

Input:

ZBA ABC ...
ABC SDK ...
123 RIU ...
12B BTE ...
11J TRE ...
BCA 642 ...

Desired output:

ABC SDK ...
BCA 642 ...
ZBA ABC ...
11J TRE ...
12B BTE ...
123 RIU ...

Upvotes: 4

Views: 2569

Answers (1)

Val
Val

Reputation: 217254

You can do the same thing at indexing time using a custom analyzer which leverages a pattern_replace character filter. It's more performant to do it at indexing than running a script sort at search time for each query.

It works in the same vein as your solution, i.e. if we detect a number, we prepend the value with a tilde ~, otherwise we don't do anything, yet we do it at indexing time and index the resulting value in the name.sort field.

PUT /tests
{
  "settings": {
    "analysis": {
      "char_filter": {
        "pre_num": {
          "type": "pattern_replace",
          "pattern": "(\\d)",
          "replacement": "~$1"
        }
      },
      "analyzer": {
        "number_tagger": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [],
          "char_filter": [
            "pre_num"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "sort": {
              "type": "string",
              "analyzer": "number_tagger",
              "search_analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

Then you can index your data

POST /tests/test/_bulk
{"index": {}}
{"name": "ZBA ABC"}
{"index": {}}
{"name": "ABC SDK"}
{"index": {}}
{"name": "123 RIU"}
{"index": {}}
{"name": "12B BTE"}
{"index": {}}
{"name": "11J TRE"}
{"index": {}}
{"name": "BCA 642"}

Then your query can simply look like this:

POST /tests/_search
{
  "sort": {
    "name.sort": "asc"
  }
}

And the response you'll get is:

{
  "hits": {
    "hits": [
      {
        "_source": {
          "name": "ABC SDK"
        }
      },
      {
        "_source": {
          "name": "BCA 642"
        }
      },
      {
        "_source": {
          "name": "ZBA ABC"
        }
      },
      {
        "_source": {
          "name": "11J TRE"
        }
      },
      {
        "_source": {
          "name": "12B BTE"
        }
      },
      {
        "_source": {
          "name": "123 RIU"
        }
      }
    ]
  }
}

Upvotes: 3

Related Questions