Kralizek
Kralizek

Reputation: 2063

How to divide the returned documents into a fix number of buckets in Elasticsearch

I have a question that I thought was simple to google but I couldn't find an answer.

I have an index of articles with an optional field price among the many.

I'd like to create an aggregation that divides all the articles matching the query in buckets that match these properties:

To give an idea of the ideal result, given a search returning items with prices ranging from 0 to 2000, the result of the aggregation could be something like:

[
   {"from": 0, "to": 0, "doc_count": 100},
   {"from": 1, "to": 200, "doc_count": 23},
   {"from": 201, "to": 400, "doc_count": 11},
   {"from": 401, "to": 600, "doc_count": 23},
   {"from": 601, "to": 800, "doc_count": 120},
   {"from": 801, "to": 1000, "doc_count": 1100},
   {"from": 1001, "to": 1200, "doc_count": 25},
   {"from": 1201, "to": 1400, "doc_count": 12},
   {"from": 1401, "to": 1600, "doc_count": 24},
   {"from": 1601, "to": 1800, "doc_count": 42},
   {"from": 1801, "to": 2000, "doc_count": 52}
]

I have looked into histogram aggregation but it focuses on fixed intervals rather than fixed amount of buckets and I couldn't find a way to isolate the items with price = 0.

{
  "aggs": {
    "price": {
      "histogram": {
        "field": "price",
        "interval": 2500.0
      }
    }
  },
  "query": {
    "match_all": {}
  },
  "size": 0
}

With some backend processing I was able to obtain the following result

[
  {
    "Key": {
      "From": 0,
      "To": 2499
    },
    "Items": 139
  },
  {
    "Key": {
      "From": 2500,
      "To": 4999
    },
    "Items": 17
  },
  {
    "Key": {
      "From": 5000,
      "To": 7499
    },
    "Items": 7
  },
  {
    "Key": {
      "From": 7500,
      "To": 9999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 10000,
      "To": 12499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 12500,
      "To": 14999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 15000,
      "To": 17499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 17500,
      "To": 19999
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 20000,
      "To": 22499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 22500,
      "To": 24999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 25000,
      "To": 27499
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 27500,
      "To": 29999
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 30000,
      "To": 32499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 32500,
      "To": 34999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 35000,
      "To": 37499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 37500,
      "To": 39999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 40000,
      "To": 42499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 42500,
      "To": 44999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 45000,
      "To": 47499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 47500,
      "To": 49999
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 50000,
      "To": 52499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 52500,
      "To": 54999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 55000,
      "To": 57499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 57500,
      "To": 59999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 60000,
      "To": 62499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 62500,
      "To": 64999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 65000,
      "To": 67499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 67500,
      "To": 69999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 70000,
      "To": 72499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 72500,
      "To": 74999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 75000,
      "To": null
    },
    "Items": 1
  }
]

I also have discarded using the range aggregation as it requires the definition of well-defined buckets by their boundaries.

I hope I gave enough context about what I am trying to achieve and what I attempted.

Thanks.

EDIT: added paragraph about range aggregation.

Upvotes: 0

Views: 204

Answers (1)

rabbitbr
rabbitbr

Reputation: 3261

Range aggregation will allow you to create price ranges and so you will have a controlled limit of buckets you want

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html

{
  "aggs": {
    "price": {
      "range": {
        "field": "price",
        "ranges": [
          { "from": 0.0, "to": 1.0 },
          { "from": 1.0, "to": 10.0 },
          { "from": 10.0, "to": 20.0 },
          { "from": 20.0 }
        ]
      }
    }
  }, 
  "size": 0
}

Upvotes: 1

Related Questions