Reputation: 503
I did several aggregations to SUM some values on our installation of ES 1.7.2.
Found the hard way that on some random situations, the doc_count of each aggregation, doesn't match with the SUM of doc_count of the nested level.
"key": 503,
"doc_count": 383778,
"regionid": {...}
So doc_count=383778
If I SUM doc_count of every element of the regionid of the list bellow, I have doc_count=383718
"key": 503,
"doc_count": 383778,
"regionid": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 303821,
"ProviderId": {...}
},
{
"key": 27,
"doc_count": 23834,
"ProviderId": {...}
},
{
"key": 25,
"doc_count": 9565,
"ProviderId": {...}
},
{
"key": 36,
"doc_count": 8857,
"ProviderId": {...}
},
{
"key": 14,
"doc_count": 8222,
"ProviderId": {...}
},
{
"key": 68,
"doc_count": 6746,
"ProviderId": {...}
},
{
"key": 19,
"doc_count": 4574,
"ProviderId": {...}
},
{
"key": 28,
"doc_count": 4164,
"ProviderId": {...}
},
{
"key": 10,
"doc_count": 3006,
"ProviderId": {...}
},
{
"key": 31,
"doc_count": 2020,
"ProviderId": {...}
},
{
"key": 21,
"doc_count": 1410,
"ProviderId": {...}
},
{
"key": 32,
"doc_count": 1368,
"ProviderId": {...}
},
{
"key": 22,
"doc_count": 1367,
"ProviderId": {...}
},
{
"key": 8,
"doc_count": 1010,
"ProviderId": {...}
},
{
"key": 16,
"doc_count": 825,
"ProviderId": {...}
},
{
"key": 35,
"doc_count": 559,
"ProviderId": {...}
},
{
"key": 34,
"doc_count": 517,
"ProviderId": {...}
},
{
"key": 26,
"doc_count": 414,
"ProviderId": {...}
},
{
"key": 18,
"doc_count": 371,
"ProviderId": {...}
},
{
"key": 15,
"doc_count": 362,
"ProviderId": {...}
},
{
"key": 33,
"doc_count": 185,
"ProviderId": {...}
},
{
"key": 9,
"doc_count": 143,
"ProviderId": {...}
},
{
"key": 29,
"doc_count": 102,
"ProviderId": {...}
},
{
"key": 17,
"doc_count": 100,
"ProviderId": {...}
},
{
"key": 30,
"doc_count": 96,
"ProviderId": {...}
},
{
"key": 20,
"doc_count": 80,
"ProviderId": {...}
}
]
}
},
Do you guys know why is this happening?
Maybe a bug?
Part of my aggregation:
{
"aggs": {
"Provider": {
"terms": {
"field": "Provider"
},
"aggs": {
"Gateway": {
"terms": {
"field": "Gateway"
},
"aggs": {
"CustomerId": {
"terms": {
"field": "CustomerId"
},
"aggs": {
"regionid": {
"terms": {
"field": "regionid"
Any help is appreciated. Thanks
Upvotes: 0
Views: 1204
Reputation: 12449
Aggregations in ES are not exact, they are an estimate based on the number of records sampled. Given a big enough sample size, that number can be exact, but that has significant performance implications.
You can read more info on "Shard Size" in the ES documentation on shard_size for terms aggregation
The flatter your index (meaning the more buckets the aggregation returns) the more you need to increase the Shard Size. We found that for a flat index in our system a 20x multiplier was a good rule of thumb. So if I'm returning the top 10 records for an aggregation, we use a shard size of 200.
Upvotes: 3