Keshav Agarwal
Keshav Agarwal

Reputation: 821

Elasticsearch count if more than one document with same value exists

I want the count of documents if the value of a field is same in more than one documents. How can I write a DSL query to do so?

Example:

Let's say I have these documents:

{ _id:1, foo:1}
{ _id:2, foo:1}
{ _id:3, foo:3}
{ _id:4, foo:2}
{ _id:5, foo:3}

I want the count of documents if the same value of foo is found in more than one documents. Here, I want the count as 2.

UPDATE

After running the terms query as:

{
   "size": 0,
   "aggs": {
      "counts": {
          "terms": {
              "field": "foo"
          }
      }
   }
}

I got this result:

'aggregations':{
    'counts':{
        'buckets':[
             {'doc_count': 221,'key': '10284'},
             {'doc_count': 71,'key': '6486'},
             {'doc_count': 71,'key': '7395'}
         ],
        'doc_count_error_upper_bound': 0,
        'sum_other_doc_count': 0
    }
}

I want another field as total_count which has the value 3 as there are 3 keys with doc_count more than 1. How can I do that?

Upvotes: 0

Views: 2295

Answers (2)

Andrei Stefan
Andrei Stefan

Reputation: 52368

I don't think you can do this out of the box with ES only. You basically need a bucket count after a min_doc_count: 2 terms aggregation.

In ES 5 you will have this: https://github.com/elastic/elasticsearch/issues/19553 (for bucket_selector aggregation there will be a _bucket_count variable that can be used). Still to be seen if that variable can be used in other scripts as well.

Upvotes: 1

Val
Val

Reputation: 217514

You can try a simple terms aggregation on the foo field like this:

{
   "size": 0,
   "aggs": {
      "counts": {
          "terms": {
              "field": "foo"
          }
      }
   }
}

After running this, you'll get

  • for key 1: doc_count 2
  • for key 3: doc_count 2
  • for key 1: doc_count 1

Upvotes: 1

Related Questions