4the3eam
4the3eam

Reputation: 194

Date_histogram aggregation returns bad results

I had to create aggregation that counts number of documents containing in date ranges. My query looks like:

{  
   "query":{  
      "range":{  
         "doc.createdTime":{  
            "gte":1483228800000,
            "lte":1485907199999
         }
      }
   },
   "size":0,
   "aggs":{  
      "by_day":{  
         "histogram":{  
            "field":"doc.createdTime",
            "interval":"604800000ms",
            "format":"yyyy-MM-dd'T'HH:mm:ssZZ",
            "min_doc_count":0,
            "extended_bounds":{  
               "min":1483228800000,
               "max":1485907199999
            }
         }
      }
   }
}

Interval: 604800000 equals to 7 days.

As a result, I recive these:

"aggregations": {
    "by_day": {
      "buckets": [
        {
          "key_as_string": "2016-12-29T00:00:00+00:00",
          "key": 1482969600000,
          "doc_count": 0
        },
        {
          "key_as_string": "2017-01-05T00:00:00+00:00",
          "key": 1483574400000,
          "doc_count": 603
        },
        {
          "key_as_string": "2017-01-12T00:00:00+00:00",
          "key": 1484179200000,
          "doc_count": 3414
        },
        {
          "key_as_string": "2017-01-19T00:00:00+00:00",
          "key": 1484784000000,
          "doc_count": 71551
        },
        {
          "key_as_string": "2017-01-26T00:00:00+00:00",
          "key": 1485388800000,
          "doc_count": 105652
        }
      ]
    }
  }

As You can mantion that my buckets starts from 29/12/2016, but as a range query do not cover this date. I expect my buckets should start from 01/01/2017 as I pointed in the range query. This problem occurs only in query with interval with number of days greater then 1. In case of any other intervals it works fine. I've tried with day, months and hours and it works fine. I've tried also to use filtered aggs and only then use date_histogram. Result is the same. I'm using Elasticsearch 2.2.0 version.

And the question is how I can force date_histogram to start from date I need?

Upvotes: 2

Views: 1219

Answers (1)

IgorekPotworek
IgorekPotworek

Reputation: 1335

Try to add offset param with value calculated from given formula:

value = start_date_in_ms % week_in_ms = 1483228800000 % 604800000 = 259200000

{
  "query": {
    "range": {
      "doc.createdTime": {
        "gte": 1483228800000,
        "lte": 1485907199999
      }
    }
  },
  "size": 0,
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field": "doc.createdTime",
        "interval": "604800000ms",
        "offset": "259200000ms",
        "format": "yyyy-MM-dd'T'HH:mm:ssZZ",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": 1483228800000,
          "max": 1485907199999
        }
      }
    }
  }
}

Upvotes: 3

Related Questions