Vioh
Vioh

Reputation: 31

Elasticsearch: How to return all documents that have the highest value in a field?

I'm quite a newbie to Elasticsearch, and I'm currently having some difficulty with a rather basic problem. Suppose I have the following mapping:

PUT /myindex/_mappings/people 
{
    "properties": {
        "name": {"type": "keyword"},
        "age" : {"type": "integer"},
    }
}

with the following documents:

{"name": "Bob", "age": 20},
{"name": "Ben", "age": 25},
{"name": "Eli", "age": 30},
{"name": "Eva", "age": 20},
{"name": "Jan", "age": 21},
{"name": "Jim", "age": 20},
{"name": "Lea", "age": 30},

How do I create a single query which returns all people that are the oldest in the index? In other words, I'm expecting Eli and Lea to be returned because they are both 30 years of age, older than everyone else.

I'm using Elasticsearch API 6.0.0 for javascript (my application is written in nodejs). Right now, my workaround is to perform 2 requests to the database. The first is to aggregate the max age (which should return 30), and then use this max age to perform another request:

GET /myindex/people/_search
{
    "aggs": {
        "max_age": {"max": {"field": "age"}}
    }
}

GET /myindex/people/_search
{
    "query": {"term": {"age": <max_age>}} // where <max_age> should be 30
}

Obviously, this is very inefficient. Can you help me formulate a single query that does all of this?

The difficult thing is I don't know beforehand how many documents that have the highest value, which means I can't use the "size" approach mentioned here "Single query to find document with largest value for some field"

Thanks in advance!

Upvotes: 3

Views: 2671

Answers (1)

Marko Vranjkovic
Marko Vranjkovic

Reputation: 6869

You can combine terms and top_hits aggregation like this

GET /myindex/people/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "terms": {
        "field": "age",
        "order": {
          "_term": "desc"
        },
        "size": 1
      },
      "aggs": {
        "oldest_people": {
          "top_hits": {
            "from": 0,
            "size": 9000
          }
        }
      }
    }
  }
}

Notice "order": { "_term": "desc" } and "size": 1 that returns only the bucket with max age from terms aggregation. Then we just list first 9000 (or arbitrary number) documents with top_hits.

Upvotes: 5

Related Questions