Benjamin
Benjamin

Reputation: 185

Elasticsearch sorting by array column

How to sort records by column with array of numbers? For example:

[1, 32, 26, 16]
[1, 32, 10, 1500]
[1, 32, 1,  16]
[1, 32, 2,  17]

The result that is to be expected:

[1, 32, 1,  16]
[1, 32, 2,  17]
[1, 32, 10, 1500]
[1, 32, 26, 16]

Elasticsearch has sort mode option: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-request-sort.html#_sort_mode_option. But no one variant is not appropriated.

Language Ruby can sort arrays of numbers' array, ruby has method Array.<=>, which description says "Each object in each array is compared"

How to do the same with elasticsearch?

P.S. Sorry for my English

Upvotes: 4

Views: 7881

Answers (1)

Nikolay Vasiliev
Nikolay Vasiliev

Reputation: 6076

In ElasticSearch arrays of objects do not work as you would expect:

Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested datatype instead of the object datatype.

This is explained in more detail in Nested datatype.

It is not possible to access array elements at sort time by their indices since they are stored in a Lucene index, which allows basically only set operations ("give docs that have array element = x" or "give docs that do not have array element = x").

However, by default the initial JSON document inserted into the index is stored on the disk and is available for scripting access in the field _source.

You have two options:

  1. use script based sorting
  2. store value for sorting explicitly as string

Let's discuss these options in a bit more detail.

1. Script based sorting

The first option is more like a hack. Let's assume you have a mapping like this:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_array": {
          "type": "integer"
        }
      }
    }
  }
}

Then you can achieve intended behavior with a scripted sort:

POST my_index/my_type/_search
{
      "sort" : {
        "_script" : {
            "script" : "String s = ''; for(int i = 0; i < params._source.my_array.length; ++i) {s += params._source.my_array[i] + ','}  s",
            "type" : "string",
            "order" : "asc"
        }
    }
}

(I tested the code on ElasticSearch 5.4, I believe there should be something equivalent for the earlier versions. Please consult relevant documentation in the case you need info for earlier versions, like for 1.4.)

The output will be:

  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "my_array": [
            1,
            32,
            1,
            16
          ]
        },
        "sort": [
          "1,32,1,16,"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "my_array": [
            1,
            32,
            10,
            1500
          ]
        },
        "sort": [
          "1,32,10,1500,"
        ]
      }
    ]   }

Note that this solution will be slow and memory consuming since it will have to read _source for all documents under sort from disk and to load them into memory.

2. Denormalization

Storing the value for sorting explicitly as string is more like ElasticSearch approach, which favors denormalization. Here the idea would be to do the concatenation before inserting the document into the index and use robust sorting by a string field.

Please select the solution more appropriate for your needs.

Hope that helps!

Upvotes: 2

Related Questions