Reputation: 169
When I'm searching by the following aggregation:
"aggregations": {
"codes": {
"terms": {
"field": "code"
},
"aggs": {
"dates": {
"date_range": {
"field": "created_time",
"ranges": [
{
"from": "2017-12-06T00:00:00.000",
"to": "2017-12-06T16:00:00.000"
},
{
"from": "2017-12-07T00:00:00.000",
"to": "2017-12-07T23:59:59.999"
}
]
}
}
}
}
}
I get the following result:
"aggregations": {
"codes": {
"buckets": [
{
"key": "123456",
"doc_count": 104005499,
"dates": {
"buckets": [
{
"key": "2017-12-05T20:00:00.000Z-2017-12-06T12:00:00.000Z",
"from_as_string": "2017-12-05T20:00:00.000Z",
"to_as_string": "2017-12-06T12:00:00.000Z",
"doc_count": 156643
},
{
"key": "2017-12-06T20:00:00.000Z-2017-12-07T19:59:59.999Z",
"from_as_string": "2017-12-06T20:00:00.000Z",
"to_as_string": "2017-12-07T19:59:59.999Z",
"doc_count": 11874
}
]
}
},
...
]
}
}
So now I have a list of buckets of buckets. I need to have a total count value for each bucket, which is the sum of doc_counts
of the buckets inside. For example, the total count for my first bucket should be 156643 + 11874 = 168517.
I've tried using Sub Bucket aggregation, but
"totalcount": {
"sum_bucket": {
"buckets_path": "dates"
}
}
this is not going to work, because "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.bucket.range.date.InternalDateRange.Bucket"
. Any ideas how should I do this?
Upvotes: 3
Views: 2735
Reputation: 6076
Looks like this is a known problem. There's a discussion on Elastic forum, where I have found a hack to solve it (thanks to Ruslan_Didyk, the author, btw):
POST my_aggs/my_doc/_search
{
"size": 0,
"aggregations": {
"codes": {
"terms": {
"field": "code"
},
"aggs": {
"dates": {
"date_range": {
"field": "created_time",
"ranges": [
{
"from": "2017-12-06T00:00:00.000",
"to": "2017-12-06T16:00:00.000"
},
{
"from": "2017-12-07T00:00:00.000",
"to": "2017-12-07T23:59:59.999"
}
]
},
"aggs": {
"my_cnt": {
"value_count": {
"field": "created_time"
}
}
}
},
"totalcount": {
"stats_bucket": {
"buckets_path": "dates>my_cnt"
}
}
}
}
}
}
The reason why you can't make only totalcount
is because date_range
implicitly creates sub-buckets and pipeline aggregations cannot handle it (I would say it is a bug of Elasticsearch).
So the hack is to add another sub-aggregation to dates
: my_cnt
that just counts the amount of documents in the bucket. (Note that I used value_count
aggregation on created_time
field assuming it's present in all documents and has only one value.)
Given the set of documents like this:
{"code":"1234","created_time":"2017-12-06T01:00:00"}
{"code":"1234","created_time":"2017-12-06T17:00:00"}
{"code":"1234","created_time":"2017-12-07T01:00:00"}
{"code":"1234","created_time":"2017-12-06T02:00:00"}
{"code":"1235","created_time":"2017-12-07T18:00:00"}
{"code":"1234","created_time":"2017-12-07T18:00:00"}
The result of the aggregation will be:
"aggregations": {
"codes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1234",
"doc_count": 5,
"dates": {
"buckets": [
{
"key": "2017-12-06T00:00:00.000Z-2017-12-06T16:00:00.000Z",
"from": 1512518400000,
"from_as_string": "2017-12-06T00:00:00.000Z",
"to": 1512576000000,
"to_as_string": "2017-12-06T16:00:00.000Z",
"doc_count": 2,
"my_cnt": {
"value": 2
}
},
{
"key": "2017-12-07T00:00:00.000Z-2017-12-07T23:59:59.999Z",
"from": 1512604800000,
"from_as_string": "2017-12-07T00:00:00.000Z",
"to": 1512691199999,
"to_as_string": "2017-12-07T23:59:59.999Z",
"doc_count": 2,
"my_cnt": {
"value": 2
}
}
]
},
"totalcount": {
"count": 2,
"min": 2,
"max": 2,
"avg": 2,
"sum": 4
}
},
{
"key": "1235",
"doc_count": 1,
"dates": {
"buckets": [
{
"key": "2017-12-06T00:00:00.000Z-2017-12-06T16:00:00.000Z",
"from": 1512518400000,
"from_as_string": "2017-12-06T00:00:00.000Z",
"to": 1512576000000,
"to_as_string": "2017-12-06T16:00:00.000Z",
"doc_count": 0,
"my_cnt": {
"value": 0
}
},
{
"key": "2017-12-07T00:00:00.000Z-2017-12-07T23:59:59.999Z",
"from": 1512604800000,
"from_as_string": "2017-12-07T00:00:00.000Z",
"to": 1512691199999,
"to_as_string": "2017-12-07T23:59:59.999Z",
"doc_count": 1,
"my_cnt": {
"value": 1
}
}
]
},
"totalcount": {
"count": 1,
"min": 1,
"max": 1,
"avg": 1,
"sum": 1
}
}
]
}
}
The desired value is under totalcount.sum
.
As I already said, this is valid only if the assumption that created_time is always present and is exactly one
holds. If in a different situation the field under date_range
aggregation would have several values (e.g. update_time
to indicate all updates of a document), then sum will no longer be equal to the actual number of the matched documents (if these dates overlap).
In this case you can always go with filter
aggregation with a range
query inside.
Hope that helps!
Upvotes: 1