Group documents by field value

Question

NOTE this is NOT a «how to get counts of distinct values» question. I want the docs, not the counts.

Let's say I have this mapping:

country, color, height, weight

I have indexed these documents:

1. RU, red, 180, 90
2. BY, green, 170, 80
3. BY, blue, 180, 75
4. KZ, blue, 180, 95
5. KZ, red, 185, 100
6. KZ, red, 175, 80
7. KZ, red, 170, 80

I want to execute a query like groupby(country, color, doc_limit=2) which would return something like this:

{
  "RU": {
    "red": [
      (doc 1. RU, red, 180, 90)
    ],
  },
  "BY": {
    "green": [
      (doc 2)
    ],
    "blue": [
      (doc 3)
    ]
  },
  "KZ": {
    "blue": [
      (doc 4)
    ],
    "red": [
      (doc 5),
      (doc 6)
    ]
  }
}

with no more than 2 documents in each bucket.

How would I do that?

Val · Accepted Answer

That can be achieved with a terms aggregation on the country field, combined with a terms sub-aggregation on the color field and then finally a top_hits aggregation to get 2 matching docs per bucket

{
   "size": 0,
   "aggs": {
      "countries": {
         "terms": {
            "field": "country"
         },
         "aggs": {
            "colors": {
               "terms": {
                  "field": "color"
               },
               "aggs": {
                  "docs": {
                     "top_hits": {
                        "size": 2
                     }
                  }
               }
            }
         }
      }
   }
}

Group documents by field value

Answers (1)

Related Questions