Reputation: 306
ElasticSearch builds the aggregation results based on all the hits of the query independently of the from and size parameters. This is what we want in most cases, but I have a particular case in which I need to limit the aggregation to the top N hits. The limits filter is not suitable as it does not fetch the best N items but only the first X matching the query (per shard) independently of their score.
Is there any way to build a query whose hit count has an upper limit N in order to be able to build an aggregation limited to those top N results? And if so how?
Subsidiary question: Limiting the score of matching documents could be an alternative even though in my case I would require a fixed bound. Does the min_score parameter affect aggregation?
Upvotes: 8
Views: 5146
Reputation: 16355
You are looking for Sampler Aggregation.
I have a similar answer explained here
Optionally, you can use the field or script and max_docs_per_value settings to control the maximum number of documents collected on any one shard which share a common value.
Upvotes: 2
Reputation: 4537
It looks like Sampler Aggregation can now be used for this purpose. Note that it is only available as of Elastic 2.0.
Upvotes: 0
Reputation: 1312
I need to limit the aggregation to the top N hits
With nested aggregations, your top bucket can represent those N hits, with nested aggregations operating on that bucket. I would try a filter
aggregation for the top level aggregation.
The tricky part is to make use the of _score
somehow in the filter and to limit it exactly to N entries... There is a limit
filter that works per shard, but I don't think it would work in this context.
Upvotes: 0