zep
zep

Reputation: 1

ElasticSearch aggregating log entries with nested-aggs

I've got some simple logfile-data in ES cluster, mappings are:

{
           "category": {
              "type": "string"
           },
           "element": {
              "type": "long"
           },
           "group": {
              "type": "string"
           },
           "seen_at": {
              "type": "date",
              "format": "dateOptionalTime"
           }

}

...which i want to aggregate in variuos time-based intervals and store in another index. I need intervals for each element (it's basically an ID of some resource) in each category/group. I came up with some nested-aggs:

{
    "size": 0,
    "aggs": {
        "categories": { 
            "terms": {
                "field": "category"
            },
            "aggs": {
                "groups": {
                    "terms": {
                        "field": "group"
                    },
                    "aggs": {
                        "elements": {
                            "terms": {
                                "field": "element"
                            },
                            "aggs": {
                                "annual": {
                                    "date_histogram": {
                                        "field": "seen_at",
                                        "interval": "1d",
                                        "format": "yyyy-MM-dd"
                                    }
                                }
                            }        
                        }
                    }                    
                }
            }
        }
    }
}

..but it seems to return only part of data (not all "element" IDs are in aggregated results). No timeouts, no errors. So i guess it's in something in my nested-aggs query.

Any ideas?

Upvotes: 0

Views: 76

Answers (1)

Val
Val

Reputation: 217274

By default, only the 10 top-most buckets are returned in terms aggregations. But you can definitely change this behavior by adding a size parameter to increase that limit. See below I've increased the limit to 100, but you can increase more (or less) to better fit your needs.

{
    "size": 0,
    "aggs": {
        "categories": { 
            "terms": {
                "field": "category",
                "size": 100                    <----- increase size
            },
            "aggs": {
                "groups": {
                    "terms": {
                        "field": "group",
                        "size": 100            <----- increase size
                    },
                    "aggs": {
                        "elements": {
                            "terms": {
                                "field": "element",
                                "size": 100    <----- increase size
                            },
                            "aggs": {
                                "annual": {
                                    "date_histogram": {
                                        "field": "seen_at",
                                        "interval": "1d",
                                        "format": "yyyy-MM-dd"
                                    }
                                }
                            }        
                        }
                    }                    
                }
            }
        }
    }
}

Upvotes: 2

Related Questions