nascosto
nascosto

Reputation: 411

How to do a year over year aggregation with Elasticsearch?

Assuming I have a date field on a document, I know using the date_histogram aggregation I can get a document count by day, month, year, etc.

What I want to do is get the average document count for January, February, March, etc. over several given years. The same goes for Monday, Tuesday, Wednesday, etc. over several given weeks. Is there a way to do this having just that same date field or what is the best way to accomplish this with Elasticsearch?


Example

Let's say we have a bunch of orders placed over three years:

What I want is the average of each month over the given years, so the output would be:

Jan (10 + 13 + 10 / 3 = 11 orders), Feb (6.33 orders), Mar (8.33 orders), Apr (13 orders), etc.

It would be best if this can be generalized for N years (or N Januaries, etc.) so that we search over any date range.

Upvotes: 5

Views: 2408

Answers (2)

RichS
RichS

Reputation: 3147

You can use 'monthOfYear' like this:

"aggregations": {
    "timeslice": {
        "histogram": {
            "script": "doc['timestamp'].date.getMonthOfYear()",
            "interval": 1,
            "min_doc_count": 0,
            "extended_bounds": {
                "min": 1,
                "max": 12
            },
            "order": {
                "_key": "desc"
            }
        }
    }

The extended bounds will ensure you get a value for every month (even if it is zero).

If you want the month names, you can either do that in your own code, or, do this (at the consequence that you won't get values for months that have no data):

"aggregations": {
    "monthOfYear": {
        "terms": {
            "script": "doc['timestamp'].date.monthOfYear().getAsText()",
            "order": {
                "_term": "asc"
            }
        }
    }

Once you've got this, you can nest your stats aggregation inside this one:

"aggregations: {
    "monthOfYear": {
        "terms": {
            ...
        },
        "aggregations": {
            "stats": ...
        }
    }
 }

The question is pretty old now, but, hope this helps someone.

Upvotes: 4

Olly Cruickshank
Olly Cruickshank

Reputation: 6180

My understanding of what you want is:

You'd like to see the average number of documents per month in yearly buckets

is that correct?

if so, you could count the number of documents in a year (i.e. the yearly bucket) and then divide by 12 using a script.

E.g. to show the daily average doc count in weekly buckets (assuming 30 days per month):

curl -XGET 'http://localhost:9200/index/type/_search?pretty' -d '{
  "aggs" : {
    "monthly_bucket": {
         "date_histogram": {"field": "datefield","interval": "week"},
              "aggs" : {
                    "weekly_average": {"sum" : {"script" : " doc[\"datefield\"].value>0 ? 1/30 : 0"} }}
         }
     }
}'

Upvotes: 0

Related Questions