Sharmiko
Sharmiko

Reputation: 623

Elasticsearch date histogram

I am using elasticsearch date histogram aggregation on @timestamp field. This is part of the query:

'stats': {
    'date_histogram': {
        'field': '@timestamp',
        'interval': '1h',
        'format': 'yyyy-MM-dd H:m:s'
    }
}

and mapping of the @timestamp:

"@timestamp": {
    "type": "date"
}

My time interval is 1h. But I also need to extract minute information from timestamp without performing aggregation on 1m. I tried to specify format of the string representation of the key. I got following output:

'key_as_string': '2020-11-07 10:0:0'
'key': 1604743200000

Is there are way to include minutes in the aggregation results? Either in key or key_as_string?

One @timestamp example indexed in es:

'@timestamp': '2020-12-09T13:50:46.056000Z'

Upvotes: 2

Views: 6604

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16895

Histogram values are rounded down to the closest bucket, obeying the formula

bucket_key = Math.floor(value / interval) * interval

Although it may seem useful to show the exact minutes if you had precisely one value in any given bucket but histograms usually aggregate a bunch of values and so it does not really make sense to talk about minute-based bucket keys when we're working with hourly intervals.

With that being said, date histograms do accept sub aggregations so if you'd like to retrieve the individual docs' @timestamps in the desired format, you could utilize a top_hits aggregation with script_fields:

{
  "size": 0,
  "aggs": {
    "stats": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "format": "yyyy-MM-dd H:m:s"
      },
      "aggs": {
        "concrete_values": {
          "top_hits": {
            "script_fields": {
              "ts_with_minutes": {
                "script": {
                  "source": "LocalDateTime.ofInstant(Instant.ofEpochMilli(doc['@timestamp'].value.millis), ZoneId.of('UTC')).format(DateTimeFormatter.ofPattern('yyyy-MM-dd H:m:s'))"
                }
              }
            },
            "size": 10
          }
        }
      }
    }
  }
}

Alternatively, you might also be interested in the timestamps that come most often, grouped by minutes (the seconds are left out from the format):

{
  "size": 0,
  "aggs": {
    "stats": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "format": "yyyy-MM-dd H:m:s"
      },
      "aggs": {
        "most_represented_timestamps": {
          "terms": {
            "field": "@timestamp",
            "format": "yyyy-MM-dd H:m",
            "size": 10
          }
        }
      }
    }
  }
}

Upvotes: 3

Related Questions