Reputation: 6364
Is there a way to do group by aggregations and get all the documents that belong to particular group aggregate ?
so this is not like group by aggregation where for each group you get some aggregate/metrics but also I want all the records that lead to a particular group aggregate in one query. Is that possible in ES today?
For Example:
Input Dataset
{"name": "foo", "amount": 5, "city":"san francisco", "state": "CA"}
{"name": "foo", "amount": 10, "city":"Los angeles", "state": "CA"}
{"name": "bar", "amount": 20, "city":"Austin", "state": "TX"}
Now say I want to group by name and state and get sum of "amount" and count for each group and the records themselves that lead to aggregate results. so the expected output is like this
Expected Output:
[
{group: {"name": "foo", "state": "CA"}, "amount": 15, "count": 2, "docs": [{"name": "foo", "amount": 5, "city":"san francisco", "state": "CA"}, {"name": "foo", "amount": 10, "city":"Los angeles", "state": "CA"}]},
{group: {"name": "bar", "state": "TX"}, "amount": 20, "count": 1, "docs": [{"name": "bar", "amount": 20, "city":"Austin", "state": "TX"}]}
]
ES 5.0 is fine.
Upvotes: 1
Views: 580
Reputation: 141
You can use a combination of sub aggregations to get all your group by metrics, but it is a bad idea to try to get the hits returned as part of the aggregation. For N documents you are grouping over, you are essentially asking Elasticsearch to return every single document which defeats the purpose of aggregating in the first place.
Each field you are "grouping" on (in ES parlance, term aggregating) needs to be its own aggregation but you can nest them infinitely and programmatically serialize and deserialize the results according to the number of groupings you define. Make sure your term fields are "keyword" types!
This query will give you all the metrics you want-- you just need to flatten the result app-side:
{
"aggs" : {
"by_name" : {
"terms" : { "field" : "name" },
"aggs" : {
"by_state" : {
"terms" : { "field" : "state" },
"aggs" : {
"total_amount" : { "sum" : { "field" : "amount" } }
}
}
}
}
}
}
If you really need those documents, can you use term filters to dynamically load them? Alternatively, if you really need to hack it and you understand the distribution of your data, you can use the top_hits sub aggregation to return the documents. Be aware that each additional sub aggregation, especially top hits, will impact performance.
Upvotes: 1