Reuven Trabin
Reuven Trabin

Reputation: 451

How to include all docs in ElasticSearch Aggregation and avoid sum_other_doc_count > 0

ES is not mainstream for my work, and there's one behavior I'm not able to correct. I have a fairly simple aggregation query:

GET /my_index/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "request_type": "some_type"
          }
        },
        {
          "match": {
            "carrier_name.keyword": "some_carrier"
          }
        }
      ]
    }
  },
  "aggs": {
    "by_date": {
      "terms": {
        "field": "date",
        "order": {
          "_term": "asc"
        }
      },
      "aggs": {
        "carrier_total": {
          "sum": {
            "field": "total_count"
          }
        }
      }
    }
  }
}

My understanding from https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html is that not all documents are included in the aggregation. Indeed, depending on the query section, I do see in the results "sum_other_doc_count" : with values greater than zero.

My question: is there a way to construct the search so that all docs are included? The number of documents is fairly small, typically under 1k,

Thanks in advance, Reuven

Upvotes: 5

Views: 8469

Answers (2)

Bhavya
Bhavya

Reputation: 16172

According to the documentaion,

size defaults to 10

from + size can not be more than the index.max_result_window index setting, which defaults to 10,000.

In your case the documents are fairly small, nearly 1k, therefore 1k results can be easily retrieved.

The size parameter can be set to define how many term buckets should be returned out of the overall terms list. By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client.

So a request is to be made to include top 1000 documents, in the field date.

...

"by_date": {
  "terms": {
    "field": "date",
    "order": {
      "_term": "asc"
    },
    "size": 1000
  }
}

...

The higher the requested size is, the more accurate the results will be, but also, the more expensive it will be to compute the final results

To know more about this, you can refer this official doc

Upvotes: 9

Joe - Check out my books
Joe - Check out my books

Reputation: 16895

Increase the size of the terms agg from the default 10 to a large-ish number:

...
    "by_date": {
      "terms": {
        "field": "date",
        "order": {
          "_term": "asc"
        },
        "size": 1000           <-----
      }
...

Upvotes: 0

Related Questions