Basil the dev
Basil the dev

Reputation: 188

Elasticseach get distinct year from date and document count

I have data and request like this

"birth": "2003-02-01T15:00:00.000Z",
    "aggs": {
        "aaa": {
            "terms": {
                "field": "birth",
                "format": "yyyy",
                "size": 100
            }
        }
    }
}

I want get the result as a list of year and the number of document for each year current result look like this:

                {
                    "key": 988556400000,
                    "key_as_string": "2001",
                    "doc_count": 2295
                },
                {
                    "key": 1375369200000,
                    "key_as_string": "2013",
                    "doc_count": 2284
                },
                {
                    "key": 1054479600000,
                    "key_as_string": "2003",
                    "doc_count": 2281
                },
                {
                    "key": 1183215600000,
                    "key_as_string": "2007",
                    "doc_count": 2276
                },
                {
                    "key": 1009810800000,
                    "key_as_string": "2001",
                    "doc_count": 2231
                },
                {
                    "key": 1170255600000,
                    "key_as_string": "2007",
                    "doc_count": 1695
                },

If key_as_string is the actual year so I expect it should contain only one value.

Upvotes: 0

Views: 272

Answers (1)

jaspreet chahal
jaspreet chahal

Reputation: 9099

Terms Aggregations will create bucket for each unique "birth" date and format field will only output year from that.

So you are getting duplicate values for year as underlying date are different.

You need to use date_histogram instead. It will bucket documents based on an interval

Query:

{
  "size": 0, 
  "aggs": {
    "NAME": {
      "date_histogram": {
        "field": "birth",
        "interval": "year"
      }
    }
  }
}

Result:

"aggregations" : {
    "NAME" : {
      "buckets" : [
        {
          "key_as_string" : "2003-01-01T00:00:00.000Z",
          "key" : 1041379200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2004-01-01T00:00:00.000Z", --> 2 docs under year 2004
          "key" : 1072915200000,
          "doc_count" : 2
        }
      ]
    }
  }

You can specify "format": "yyyy" if you want to see only year as key Query:

{
  "size": 0, 
  "aggs": {
    "NAME": {
      "date_histogram": {
        "field": "birth",
        "interval": "year",
        "format": "yyyy" --> note
      }
    }
  }
}

Upvotes: 1

Related Questions