Roman Kolbovich
Roman Kolbovich

Reputation: 89

Make a flat array from Elasticsearch query results

I have an index with the following documents (simplified):

{
    "user" : "j.johnson",
    "certifications" : [{
            "certification_date" : "2013-02-09T00:00:00+03:00",
            "previous_level" : "No Level",
            "obtained_level" : "Junior"
        }, {
            "certification_date" : "2014-05-26T00:00:00+03:00",
            "previous_level" : "Junior",
            "obtained_level" : "Middle"
        }
    ]
}

I want just to have a flat list of all certifications passed by all users where certification_date > 2014-01-01. It should be a pretty large array like this:

[{
        "certification_date" : "2014-09-08T00:00:00+03:00",
        "previous_level" : "No Level",
        "obtained_level" : "Junior"
    }, {
        "certification_date" : "2014-05-26T00:00:00+03:00",
        "previous_level" : "Junior",
        "obtained_level" : "Middle"
    }, {
        "certification_date" : "2015-01-26T00:00:00+03:00",
        "previous_level" : "Junior",
        "obtained_level" : "Middle"
    }
    ...
]

It doesn't seems to be a hard task, but I wasn't able to find an easy way to do that.

Upvotes: 1

Views: 2602

Answers (1)

Sloan Ahrens
Sloan Ahrens

Reputation: 8718

I would do it with a parent/child relationship, though you will have to reorganize your data. I don't think you can get what you want with your current schema.

More concretely, I set up an index like this, with user as parent and certification as child:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
   },
   "mappings": {
      "user": {
         "properties": {
            "user_name": { "type": "string" }
         }
      },
      "certification":{
          "_parent": { "type": "user" },
          "properties": {
              "certification_date": { "type": "date" },
              "previous_level": { "type": "string" },
              "obtained_level": { "type": "string" }
          }
      }
   }
}

added some docs:

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"user","_id":1}}
{"user_name":"j.johnson"}
{"index":{"_index":"test_index","_type":"certification","_parent":1}}
{"certification_date" : "2013-02-09T00:00:00+03:00","previous_level" : "No Level","obtained_level" : "Junior"}
{"index":{"_index":"test_index","_type":"certification","_parent":1}}
{"certification_date" : "2014-05-26T00:00:00+03:00","previous_level" : "Junior","obtained_level" : "Middle"}
{"index":{"_index":"test_index","_type":"user","_id":2}}
{ "user_name":"b.bronson"}
{"index":{"_index":"test_index","_type":"certification","_parent":2}}
{"certification_date" : "2013-09-05T00:00:00+03:00","previous_level" : "No Level","obtained_level" : "Junior"}
{"index":{"_index":"test_index","_type":"certification","_parent":2}}
{"certification_date" : "2014-07-20T00:00:00+03:00","previous_level" : "Junior","obtained_level" : "Middle"}

Now I can just search certifications with a range filter:

POST /test_index/certification/_search
{
   "query": {
      "constant_score": {
         "filter": {
            "range": {
               "certification_date": {
                  "gte": "2014-01-01"
               }
            }
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "certification",
            "_id": "QGXHp7JZTeafWYzb_1FZiA",
            "_score": 1,
            "_source": {
               "certification_date": "2014-05-26T00:00:00+03:00",
               "previous_level": "Junior",
               "obtained_level": "Middle"
            }
         },
         {
            "_index": "test_index",
            "_type": "certification",
            "_id": "yvO2A9JaTieI5VHVRikDfg",
            "_score": 1,
            "_source": {
               "certification_date": "2014-07-20T00:00:00+03:00",
               "previous_level": "Junior",
               "obtained_level": "Middle"
            }
         }
      ]
   }
}

This structure is still not completely flat the way you asked for, but I think this is as close as ES will let you get.

Here is the code I used:

http://sense.qbox.io/gist/3c733ec75e6c0856fa2772cc8f67bd7c00aba637

Upvotes: 1

Related Questions