ElasticSearch: Filter by distinct count during aggregation

Question

The following query returns distinct Ids in order by largest distinct count of Ids. What I would like to do is "include only those IDs for which total number of documents is less than 2000"

{
  "size": "0",
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2020-10-20T00:00:00",
        "lt": "2020-10-21T00:00:00"
      }
    }
  },
  "aggs": {
    "ids": {
      "terms": {
        "field": "Id.keyword",
        "size": 1000
      }
    }
  }
}

I tried adding filter by 'doc_count' but that didn't help. How do I do this?

Bhavya · Accepted Answer

You can filter the buckets using bucket_selector aggregation

Bucket Selector Aggregation is a parent pipeline aggregation which executes a script which determines whether the current bucket will be retained in the parent multi-bucket aggregation.

{
  "size": "0",
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2020-10-20T00:00:00",
        "lt": "2020-10-21T00:00:00"
      }
    }
  },
  "aggs": {
    "ids": {
      "terms": {
        "field": "Id.keyword",
        "size": 1000
      },
      "aggs": {
        "count_filter": {
          "bucket_selector": {
            "buckets_path": {
              "values": "_count"
            },
            "script": "params.values < 2000"   <-- note this
          }
        }
      }
    }
  }
}

ElasticSearch: Filter by distinct count during aggregation

Answers (1)

Related Questions