roditoo
roditoo

Reputation: 11

Elasticsearch date histogram aggregation shows incorrect dates

Im using ES version 6.8, trying to do a query with the Java High Level REST Client. I'm trying to use the Histogram Date Aggregation to get the count of documents by date. This is my query:

{
   "query":{
      "bool":{
         "filter":[
            {
               "terms":{
                  "userId":[
                     "123456", "654321"
                  ],
                  "boost":1.0
               }
            },
            {
               "range":{
                  "created":{
                     "from":1672531200000,
                     "to":1675123200000,
                     "include_lower":true,
                     "include_upper":true,
                     "boost":1.0
                  }
               }
            }
         ],
         "adjust_pure_negative":true,
         "boost":1.0
      }
   },
   "aggregations":{
      "date_count":{
         "date_histogram":{
            "field":"created",
            "interval":"1d",
            "offset":0,
            "order":{
               "_key":"asc"
            },
            "keyed":false,
            "min_doc_count":0
         }
      }
   }
}

Im using the BoolQueryBuilder to filter by user id and by a range date.

I expect that if I use a range date from "2023-01-01" to "2023-01-31" it will return me the quantity of documents per day, also if the count is 0, but this is returning me empty buckets.

Then I try adding one month from "2023-01-01" to "2023-02-31" and this returns me just 2 buckets, when I want one bucket per day from "2023-01-01" to "2023-02-31".

{
   "aggregations":{
      "date_histogram#date_count":{
         "buckets":[
            {
               "key":1675209600000,
               "doc_count":3
            },
            {
               "key":1675296000000,
               "doc_count":1
            }
         ]
      }
   }
}

The "key" is numeric because we use Long in the "created" field, then I parse to a date.

Does anyone know if I am doing something wrong in the query?

Thanks!

Upvotes: 0

Views: 454

Answers (1)

saw1k
saw1k

Reputation: 131

You should change mapping for created field like this

"created": {
      "type": "date",
      "format":"epoch_second"
    }

Then on your aggregation change interval to "day" and add format field with date pattern like "yyyy-MM-dd"

You aggregation query should be something like this:

 "aggregations":{
      "date_count":{
         "date_histogram":{
            "field":"created",
            "interval":"day",
            "format" : "yyyy-MM-dd",
            "offset":0,
            "order":{
               "_key":"asc"
            },
            "keyed":false,
            "min_doc_count":0
         }
      }
   }

And the effect should be like this:

"aggregations" : {
    "date_count" : {
      "buckets" : {
        "2023-08-04" : {
          "key_as_string" : "2023-08-04",
          "key" : 1691107200000,
          "doc_count" : 1
        },
        "2023-08-05" : {
          "key_as_string" : "2023-08-05",
          "key" : 1691193600000,
          "doc_count" : 1
        },
        "2023-08-06" : {
          "key_as_string" : "2023-08-06",
          "key" : 1691280000000,
          "doc_count" : 1
        },
        "2023-08-07" : {
          "key_as_string" : "2023-08-07",
          "key" : 1691366400000,
          "doc_count" : 0
        },
        "2023-08-08" : {
          "key_as_string" : "2023-08-08",
          "key" : 1691452800000,
          "doc_count" : 1
        }
      }
    }
  }

Ofcourse you can change interval for ex month :)

Upvotes: 0

Related Questions