lulijun
lulijun

Reputation: 575

How to add paging in Elasticsearch's aggregation?

I have an elasticsearch request as below:

{
    "size":0,
    "aggs":{
        "group_by_state":{
            "terms":{
                "field":"poi_id"
            },
            "aggs":{
                "sum(price)":{
                    "sum":{
                        "field":"price"
                    }
                }
            }
        }
    }
}

I want to add paging in this requst, just like

select poi_id, sum(price) from table group by poi_id limit 0,2

I've searched a lot, and found a link about it:https://github.com/elastic/elasticsearch/issues/4915.

But still I didn't get the implementation method.

Is there any way to implement it by Elasticsearch itself but not my application?

Upvotes: 5

Views: 7543

Answers (3)

Peter Colclough
Peter Colclough

Reputation: 86

Late to the party, but just discovered 'composite' aggregations in v6.3+. These allow:
1. A more 'Sql like' grouping
2. Pagination by use of the 'after_key'.
Saved our day, hope it will help others too.

Example, getting number of hits per hour between 2 dates, grouped on 5 fields:

GET myindex-idx/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"docType": "myDOcType"}}, 
        {"range": {
          "@date": {"gte": "2019-06-19T21:00:00", "lt": "2019-06-19T22:00:00"}
          }
        }
      ]
    }
  }, 
  "size": 0, 
  "aggs": {
    "mybuckets": {
      "composite": {
        "size": 100, 
        "sources": [
          {"@date": {
            "date_histogram": {
              "field": "@date", 
              "interval": "hour", 
              "format": "date_hour"}
            }
          }, 
          {"field_1": {"terms": {"field": "field_1"}}}, 
          {"field_2": {"terms": {"field": "field_2"}}}, 
          {"field_3": {"terms": {"field": "field_3"}}}, 
          {"field_4": {"terms": {"field": "field_4"}}}, 
          {"field_5": {"terms": {"field": "field_5"}}}
        ]
      }
    }
  }
}

Produces:

{
  "took": 255,
  "timed_out": false,
  "_shards": {
    "total": 80,
    "successful": 80,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 46989,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "mybuckets": {
      "after_key": {
        "@date": "2019-06-19T21",
        "field_1": 262,
        "field_2": 347,
        "field_3": 945,
        "field_4": 2258,
        "field_5": 0
      },
      "buckets": [
        {
          "key": {
            "@date": "2019-06-19T21",
            "field_1": 56,
            "field_2": 106,
            "field_3": 13224,
            "field_4": 46239,
            "field_5": 0
          },
          "doc_count": 3
        },
        {
          "key": {
            "@date": "2019-06-19T21",
            "field_1": 56,
            "field_2": 106,
            "field_3": 32338,
            "field_4": 76919,
            "field_5": 0
          },
          "doc_count": 2
        },
        ....

Following paging query issued like this, using the 'after_key object in the queries 'after' object:

GET myindex-idx/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"docType": "myDOcType"}}, 
        {"range": {
          "@date": {"gte": "2019-06-19T21:00:00", "lt": "2019-06-19T22:00:00"}
          }
        }
      ]
    }
  }, 
  "size": 0, 
  "aggs": {
    "mybuckets": {
      "composite": {
        "size": 100, 
        "sources": [
          {"@date": {
            "date_histogram": {
              "field": "@date", 
              "interval": "hour", 
              "format": "date_hour"}
            }
          }, 
          {"field_1": {"terms": {"field": "field_1"}}}, 
          {"field_2": {"terms": {"field": "field_2"}}}, 
          {"field_3": {"terms": {"field": "field_3"}}}, 
          {"field_4": {"terms": {"field": "field_4"}}}, 
          {"field_5": {"terms": {"field": "field_5"}}}
        ],
      "after": {
        "@date": "2019-06-19T21",
        "field_1": 262,
        "field_2": 347,
        "field_3": 945,
        "field_4": 2258,
        "field_5": 0
        }
      }
    }
  }
}

This pages through the results, until the mybuckets returns empty

Upvotes: 3

bburrier
bburrier

Reputation: 1459

I am working through a solution for paging aggregation results currently. What you want to use is partition. This section in the official docs is very helpful. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

To adapt your example, the terms setting would be updated as follows.

{
    "size":0,
    "aggs":{
        "group_by_state":{
            "terms":{
                "field":"poi_id",
                "include": {
                    "partition": 0,
                    "num_of_partitions": 100
                },
                "size": 10000
            },
            "aggs":{
                "sum(price)":{
                    "sum":{
                        "field":"price"
                    }
                }
            }
        }
    }
}

This will group your results into 100 partitions (num_of_partitions), with a max size of 10k results in each (size), and retrieve the first such partition (partition: 0)

If you have more than 10k unique values for the field you are aggregating on (and want to return all values) you will want to increase the size value or potentially compute size and num_of_partitions dynamically based on the cardinality of your field. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation

You might also want to use the show_term_doc_count_error setting to make sure your aggregation is returning accurate counts. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_per_bucket_document_count_error

Hope that's helpful.

Upvotes: 4

Ohlsen1980
Ohlsen1980

Reputation: 314

You can use the parameters from and size in your request. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html for more information. Your request would be something like this:

{
    "from" : 0, 
    "size" : 10,
    "aggs":{
        "group_by_state":{
            "terms":{
                "field":"poi_id"
            },
            "aggs":{
                "sum(price)":{
                    "sum":{
                        "field":"price"
                    }
                }
            }
        }
    }
}

Upvotes: -2

Related Questions