Reputation: 1
I've got some simple logfile-data in ES cluster, mappings are:
{
"category": {
"type": "string"
},
"element": {
"type": "long"
},
"group": {
"type": "string"
},
"seen_at": {
"type": "date",
"format": "dateOptionalTime"
}
}
...which i want to aggregate in variuos time-based intervals and store in another index. I need intervals for each element (it's basically an ID of some resource) in each category/group. I came up with some nested-aggs:
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category"
},
"aggs": {
"groups": {
"terms": {
"field": "group"
},
"aggs": {
"elements": {
"terms": {
"field": "element"
},
"aggs": {
"annual": {
"date_histogram": {
"field": "seen_at",
"interval": "1d",
"format": "yyyy-MM-dd"
}
}
}
}
}
}
}
}
}
}
..but it seems to return only part of data (not all "element" IDs are in aggregated results). No timeouts, no errors. So i guess it's in something in my nested-aggs query.
Any ideas?
Upvotes: 0
Views: 76
Reputation: 217274
By default, only the 10 top-most buckets are returned in terms
aggregations. But you can definitely change this behavior by adding a size
parameter to increase that limit. See below I've increased the limit to 100, but you can increase more (or less) to better fit your needs.
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 100 <----- increase size
},
"aggs": {
"groups": {
"terms": {
"field": "group",
"size": 100 <----- increase size
},
"aggs": {
"elements": {
"terms": {
"field": "element",
"size": 100 <----- increase size
},
"aggs": {
"annual": {
"date_histogram": {
"field": "seen_at",
"interval": "1d",
"format": "yyyy-MM-dd"
}
}
}
}
}
}
}
}
}
}
Upvotes: 2