elasticsearch terms aggregation incorrect

Question

I have a field that store array of strings. different documents hold different set of strings.

ex: "ftypes": ["PDF", "TXT", "XML"]

now I used this aggregation query to analyze each file type usage.

{
  "aggs": {
    "list": {
      "terms": {
        "field": "ftypes",
        "min_doc_count": 0,
        "size": 100000
      }
    }
  }
}

result ==>
{
    "took": 11,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 137265,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "list": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "PDF",
                    "doc_count": 134475
                },
                {
                    "key": "TXT",
                    "doc_count": 21312
                },
                {
                    "key": "XML",
                    "doc_count": 6597
                },
                {
                    "key": "JPG",
                    "doc_count": 1233
                }
            ]
        }
    }
}

and the results were correct as expected. but recently I've updated this field after removing XML file support. so non of the doc has file type XML. i can confirm that from this query.

{
  "query": {
    "terms": {
      "ftypes": ["XML"]
    }
  }
}

result ===>

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
    }
}

total hits count is zero. strange thing is when I do the above aggregation query again yet I can see XML as a term. doc count is zero.

{
    "took": 11,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 137265,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "list": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "PDF",
                    "doc_count": 134475
                },
                {
                    "key": "TXT",
                    "doc_count": 21312
                },
                {
                    "key": "JPG",
                    "doc_count": 1233
                },
                {
                    "key": "XML",
                    "doc_count": 0
                }
            ]
        }
    }
}

where is this XML term is now coming from if it does not exists on any document?. is there are any cache that i need to remove?

elasticsearch terms aggregation incorrect

Answers (1)

Related Questions