Jeff Saremi
Jeff Saremi

Reputation: 3004

How to do nested aggregation in ElasticSearch where aggregation is over results of sub-agg

I'm using ElasticSearch 5.3. If you could guide me on how this is done in ES or in Kibana would be appreciated. I have read the docs especially on scoped, nested, and pipeline aggregations and have not been able to get any of then work at all or produce what i"m after.

Instead of describing what i want in generic terms, I'd like to formulate my problem as an relational DB problem:

This is my table:

CREATE TABLE metrics
    (`host` varchar(50), `counter` int, `time` int)
;

INSERT INTO metrics
    (`host`, `counter`, `time`)
VALUES
    ('host1', 3, 1),
    ('host2', 2, 2),
    ('host3', 1, 3)
    ,('host1', 5, 4)
    ,('host2', 2, 5)
    ,('host3', 2, 6)
    ,('host1', 9, 7)
    ,('host2', 3, 8)
    ,('host3', 5, 9)
;

I want to get the total value for the counter for all hosts. Note that each host emits an ever increasing value for some counter so I cannot just go and add counters for each record. Instead i need to use the following SQL:

select sum(max_counter)
from ( select max(counter) as max_counter
    from metrics
    where time > 0 AND time < 10
    group by host) as temptable;

which produces the correct result of: 17 (= 9 + 3 + 5)

Upvotes: 1

Views: 693

Answers (1)

Eugene
Eugene

Reputation: 3957

You can achieve it with pipeline aggregation

{
    "size": 0,
    "aggs": {
        "hosts": {
            "terms": {
                "field": "host"
            },
            "aggs": {
                "maxCounter": {
                    "max": {
                        "field": "counter"
                    }
                }
            }
        },
        "sumCounter": {
            "sum_bucket": {
                "buckets_path": "hosts>maxCounter"
            }
        }
    },
    "query": {
        "range": {
            "time": {
                "gt": 0.0,
                "lt": 10.0
            }
        }
    }
}

First you group your entries by host field in hosts aggregation. Then inside it you apply max aggregation. And then you add sum_bucket aggregation which accepts the results from the previous one and returns the required sum. And you also filter your entries using range query.

Here is a result

{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 9,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "hosts": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "host1",
                    "doc_count": 3,
                    "maxCounter": {
                        "value": 9.0
                    }
                },
                {
                    "key": "host2",
                    "doc_count": 3,
                    "maxCounter": {
                        "value": 3.0
                    }
                },
                {
                    "key": "host3",
                    "doc_count": 3,
                    "maxCounter": {
                        "value": 5.0
                    }
                }
            ]
        },
        "sumCounter": {
            "value": 17.0
        }
    }
}

sumCounter is equal to 17.

Just in case, here is the original mapping

{
    "mappings": {
        "metrics": {
            "properties": {
                "host": {
                    "type": "text",
                    "fielddata": true
                },
                "counter": {
                    "type": "integer"
                },
                "time": {
                    "type": "integer"
                }
            }
        }
    }
}

Upvotes: 1

Related Questions