Reputation: 1105
Why is the time taken of this:
"aggs": {
"Condition": {
"terms": {
"field": "color",
"size": 10,
"min_doc_count": 1
}
}
is drastically faster than this:
"aggs": {
"Condition": {
"terms": {
"field": "color",
"size": 10,
"min_doc_count": 0
}
}
Even though they both return the same aggregation result to me?
Upvotes: 1
Views: 4256
Reputation: 17441
To add on to @moliware answer from the documentation excerpt
Setting min_doc_count=0 will also return buckets for terms that didn’t match any hit. However, some of the returned terms which have a document count of zero might only belong to deleted documents or documents from other types, so there is no warranty that a match_all query would find a positive document count for those terms.
besides deleted documents with min_doc_count=0
another significant caveat is that aggregation is not restricted to documents that match the parent query or restricted to the types
.
See the below example: Example:
1) Create test index
PUT test
2) Insert documents of type1
and type3
POST _bulk
{"index":{"_index":"test","_type":"type1","_id":"1"}}
{"condition":"good"}
{"index":{"_index":"test","_type":"type1","_id":"2"}}
{"condition":"bad"}
{"index":{"_index":"test","_type":"type1","_id":"3"}}
{"condition":"soso"}
{"index":{"_index":"test","_type":"type1","_id":"4"}}
{"condition":"excellent"}
{"index":{"_index":"test","_type":"type1","_id":"5"}}
{"condition":"bad"}
{"index":{"_index":"test","_type":"type1","_id":"6"}}
{"condition":"bad"}
{"index":{"_index":"test","_type":"type1","_id":"7"}}
{"condition":"excellent"}
{"index":{"_index":"test","_type":"type3","_id":"1"}}
{"condition":"unwell"}
3) Query all documents of type1
without term bad
:
POST test/type1/_search
{
"query": {
"bool": {
"must_not": {
"term": {
"condition": "bad"
}
}
}
},
"aggs": {
"condition_value": {
"terms": {
"field": "condition",
"size": 10,
"min_doc_count": 0
}
}
}
}
Response:
"aggregations": {
"condition_value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "excellent",
"doc_count": 2
},
{
"key": "good",
"doc_count": 1
},
{
"key": "soso",
"doc_count": 1
},
{
"key": "bad",
"doc_count": 0
},
{
"key": "unwell",
"doc_count": 0
}
]
}
}
Note the documents of type:type3
and condition:bad
in the results.
Since the term aggregations are by default orderd by doc_count
and the OP has size:10
it may appear to not affect the overall result setting size:0
would give a better picture. In short the number of terms used for generating the aggregation would be significantly larger with min_doc_count:0
.
Upvotes: 2
Reputation: 10278
Extracted from the documentation:
Setting min_doc_count=0 will also return buckets for terms that didn’t match any hit. However, some of the returned terms which have a document count of zero might only belong to deleted documents or documents from other types, so there is no warranty that a match_all query would find a positive document count for those terms.
So it seems that if you have lots of deleted documents the performance would be worse because the aggregation would process a bigger amount of documents. Try to optimize the index to see if the performance becomes similar.
Upvotes: 2