Paweł Sosnowski
Paweł Sosnowski

Reputation: 172

Elasticsearch java api get average of terms aggregation

I'm using elasticsearch with java api and I'm trying to get average value of lowest record from each bucket of term aggregation. One solution I found is to get results like this

AggregationBuilders.terms("group_by_flights").field("flight_id)
    .subAggregation(AggregationBuilders.min("minimum").field("duration")))

and then count average on the code side. The problem is that if there will be lot of result, it will allocate a lot of memory to count it. I would like to do this on elastic side. I found, that there is something like avg bucket pipeline aggregation, which can be add as sibling aggregation to terms (and others)

"the average": {
  "avg_bucket": {
    "buckets_path": "some_bucket_path" 
  }
}

Problem is that in java api you can add pipeline aggregation only as subaggregation. So if we construct our aggregation like this our terms aggregation won't be seen

AggregationBuilders.terms("group_by_flights").field("flight_id")
    .subAggregation(PipelineAggregatorBuilders.avgBucket("avg", "group_by_flights.duration" *<- this wont't be seen because its subaggregation*))

I was thinking about making some empty top aggregation and then add all aggregations as subaggregations, but it seems like silly walk-around, and I'm not understanding something correctly. Any ideas?

Upvotes: 1

Views: 2537

Answers (2)

SuperPirate
SuperPirate

Reputation: 146

My solution is use FilterAggregationBuilder to do it, this one can filtering data.The first sub aggregation to make data bucket, the second sub aggregation to merge bucket data.

AggregationBuilders.filter("global_aggregation", bool)
    .subAggregation((AggregationBuilders.terms("group_by_flights").field("flight_id"))
    .subAggregation(AggregationBuilders.min("min").field("duration")))
    .subAggregation(PipelineAggregatorBuilders.avgBucket("avg_bucket_aggs", "group_by_flights>min"));

Upvotes: 1

Paweł Sosnowski
Paweł Sosnowski

Reputation: 172

The only solution I found so far is to make aggregations as sub aggregation of "empty aggregation"

AggregationBuilders.global("global_aggregation")
    .subAggregation((AggregationBuilders.terms("group_by_flights").field("flight_id"))
        .subAggregation(AggregationBuilders.min("min").field("duration")))
    .subAggregation(PipelineAggregatorBuilders.avgBucket("avg_bucket_aggs","group_by_flights>min"))

Upvotes: 1

Related Questions