zverok
zverok

Reputation: 1400

Adding additional fields to ElasticSearch terms aggregation

Indexed documents are like:

{
  id: 1, 
  title: 'Blah',
  ...
  platform: {id: 84, url: 'http://facebook.com', title: 'Facebook'}
  ...
}

What I want is count and output stats-by-platform. For counting, I can use terms aggregation with platform.id as a field to count:

aggs: {
  platforms: {
    terms: {field: 'platform.id'}
  }
}

This way I receive stats as a multiple buckets looking like {key: 8, doc_count: 162511}, as expected.

Now, can I somehow add to those buckets also platform.name and platform.url (for pretty output of stats)? The best I've came with looks like:

aggs: {
  platforms: {
    terms: {field: 'platform.id'},
    aggs: {
      name: {terms: {field: 'platform.name'}},
      url: {terms: {field: 'platform.url'}}
    }
  }
}

Which, in fact, works, and returns pretty complicated structure in each bucket:

{key: 7,
  doc_count: 528568,
  url:
   {doc_count_error_upper_bound: 0,
    sum_other_doc_count: 0,
    buckets: [{key: "http://facebook.com", doc_count: 528568}]},
  name:
   {doc_count_error_upper_bound: 0,
    sum_other_doc_count: 0,
    buckets: [{key: "Facebook", doc_count: 528568}]}},

Of course, name and url of platform could be extracted from this structure (like bucket.url.buckets.first.key), but is there more clean and simple way to do the task?

Upvotes: 41

Views: 25846

Answers (2)

zverok
zverok

Reputation: 1400

It seems the best way to show intentions is top hits aggregation: "from each aggregated group select only one document", and then extract platform from it:

aggs: {
  platforms: {
    terms: {field: 'platform.id'},
    aggs: {
      platform: {top_hits: {size: 1, _source: {include: ['platform']}}}
  }
}

This way, each bucked will look like:

{"key": 7,
  "doc_count": 529939,
  "platform": {
    "hits": {
      "hits": [{
       "_source": {
        "platform": 
          {"id": 7, "name": "Facebook", "url": "http://facebook.com"}
        }
      }]
    }
  },
}

Which is kinda too deeep (as usual with ES), but clean: bucket.platform.hits.hits.first._source.platform

Upvotes: 75

Val
Val

Reputation: 217344

If you don't necessarily need to get the value of platform.id, you could get away with a single aggregation instead using a script that concatenates the two fields name and url:

aggs: {
  platforms: {
    terms: {script: 'doc["platform.name"].value + "," + doc["platform.url"].value'}
  }
}

Upvotes: 3

Related Questions