Bucket by fields present in returned documents using Elasticsearch

Question

Our indexed documents do not have a completely fixed schema, that is, not every field is in every document. Is there a way to create buckets based on the fields present in a set of documents (i.e. in response to a query) with the count of how many documents contain those fields? For example, these documents that I just made up comprise the results of a query:

{"name":"Bob","field1":"value","field2":"value2","field3":"value3"}
{"name":"Sue","field2":"value4","field3":"value5"}
{"name":"Ali","field1":"value6","field2":"value7"}
{"name":"Joe","field3":"value8"}

This is the information (not format) I want to extract:

  name: 4
field1: 2
field2: 3
field3: 3

Is there a way I can aggregate and count to get those results?

Andrei Stefan · Accepted Answer

Yeah, I think you can do it like this:

GET /some_index/some_type/_search?search_type=count
{
  "aggs": {
    "name_bucket": {
      "filter" : { "exists" : { "field" : "name" } }
    },
    "field1_bucket": {
      "filter" : { "exists" : { "field" : "field1" } }
    },
    "field2_bucket": {
      "filter" : { "exists" : { "field" : "field2" } }
    },
    "field3_bucket": {
      "filter" : { "exists" : { "field" : "field3" } }
    }
  }
}

And you get something like this:

   "aggregations": {
      "field3_bucket": {
         "doc_count": 3
      },
      "field1_bucket": {
         "doc_count": 2
      },
      "field2_bucket": {
         "doc_count": 3
      },
      "name_bucket": {
         "doc_count": 4
      }
   }

Bucket by fields present in returned documents using Elasticsearch

Answers (1)

Related Questions