Mike Placentra
Mike Placentra

Reputation: 885

Elasticsearch array only gives uniques to aggregation script

I would like to do some simple linear algebra in Elasticsearch with a scripted metric in an aggregation. I am trying to use an array type to store a vector. My issue is that in my script, I am receiving only a set of unique values from the array, rather than the full array with its original sorting. Is there a way to get the original array in the script?

Consider this example document:

curl -XPUT 'http://localhost:9200/arraytest/aoeu/1' -d '{
    "items": [1, 2, 2, 3]
}'

It looks right in the _source:

curl -XGET 'http://localhost:9200/arraytest/aoeu/1'

result:

{"_index":"arraytest","_type":"aoeu","_id":"1","_version":1,"found":true,"_source":{
    "items": [1, 2, 2, 3]
}}

However, it does not look right when I get the value in a script:

curl -XGET 'http://localhost:9200/arraytest/aoeu/_search?pretty&search_type=count' -d '{
    "query": {
        "match_all" : {}
    },
    "aggs": {
        "tails": {
            "scripted_metric": {
                "init_script": "_agg.items = []",
                "map_script": "_agg.items.add(doc.items)",
                "reduce_script": "items = []; for (a in _aggs) { items.add(a.items) }; return items"
            }
        }
    }
}'

result:

{
  "took" : 103,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "tails" : {
      "value" : [ [ ], [ ], [ [ 1, 2, 3 ] ], [ ], [ ] ]
    }
  }
}

I was expecting the result to include [1, 2, 2, 3], but instead it includes [1, 2, 3]. I tried accessing _source in the map script but it said there is no such field.

Upvotes: 0

Views: 254

Answers (1)

Mike Placentra
Mike Placentra

Reputation: 885

I found the issue in the docs, in a discussion about a different but related problem. Arrays of inner objects are flattened for indexing. This means that the set of all values of an inner object (or one of its fields) becomes an array on the root document.

Rather than relying on dynamic mapping, which indexes arrays of inner documents with the above effect, one can specify a nested object mapping. Elasticsearch will then not flatten the array of inner documents, but rather store them separately and index them in a way that makes joins "almost as fast" as having them embedded in the same document. Having played with it a little, I found that it makes my use case, which doesn't involve joins, fast as well (in comparison with creating a separate root document for every sub document).

I don't think this solves for preserving the order of the vector, so I will include the index on each nested document.

Example mapping:

curl -XPUT 'http://localhost:9200/arraytestnested' '{
    "mappings" : {
        "document" : {
            "properties" : {
                "some_property" : {
                    "type" : "string"
                },
                "items": {
                    "type": "nested",
                    "properties": {
                        "index" : {
                            "type" : "long"
                        },
                        "itemvalue" : {
                            "type" : "long"
                        }
                    }
                }
            }
        }
    }
}'

Upvotes: 1

Related Questions