Shaun
Shaun

Reputation: 2530

Aggregations on most recent document in group using elasticsearch

Suppose there are several documents per person that contain values:

{
  "name": "John",
  "value": 1,
  "timestamp": 2014-06-15
}

{
  "name": "John",
  "value": 2,
  "timestamp": 2014-06-16
}

{
  "name": "Sam",
  "value": 2,
  "timestamp": 2014-06-15
}

{
  "name": "Sam",
  "value": 3,
  "timestamp": 2014-06-16
}

  1. How do I get a list of the most recent documents for each person?
  2. How do I get an average of the values for the list of the most recent documents for each person? Given the sample data, this would be 2.5, not 2.

Is there some combination of buckets and metrics that could achieve this result? Will I need to implement a custom aggregator as part of a plugin, or must this sort of computation be performed in memory?

Upvotes: 4

Views: 2838

Answers (2)

Nelu
Nelu

Reputation: 18680

If you only need to find the most recent persons try something like this:

"aggs": {
    "personName": {
        "terms": {
            "field": "name",
            "size": 5,
            "order": {"timeCreated": "desc"}
        },
        "aggs": {
            "timeCreated": {
                "max": {"field": "timestamp"}
            }
        }
    }
}

Upvotes: 2

AlvaroAV
AlvaroAV

Reputation: 10553

The second operation is just an aggregation, and to get the average of the value field you could try something like:

curl -XPOST "http://DOMAIN:9200/your/data/_search" -d'
{
   "size": 0, 
   "aggregations": {
      "the_name": {
         "terms": {
            "field": "name",
            "order": {
               "value_avg": "desc"
            }
         },
         "aggregations": {
            "value_avg": {
               "avg": {
                  "field": "value"
               }
            }
         }
      }
   }
}'

To achieve a solution for your first issue I would recommend you to order the response by date, and then in your project ignore a term when you have another with the same name (meaning filter the data after the response of ES)

Upvotes: 1

Related Questions