Reputation: 569
I am currently trying to aggregate a field in Elasticsearch. When i am doing the same query for other indexes, it gives me the correct sum but for one it's exceedingly high and incorrect.
Following is the Elasticsearch query:
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*"
}
},
"filter": {
"and": [
{
"range": {
"start_timestamp": {
"from": start_date,
"to": end_date
}
}
},
{
"term": { "id" : ad_id }
}
]
}
}
},
"aggs": {
"type1": {
"terms": {
"field": "type_a",
"size": 0,
"order": {
"revenue": "desc"
}
},
"revenue": {
"sum": {
"field": "revenue"
}
}
}
}
}
I tried checking by downloading all fields and summing them up in python and its giving me the correct number which leds me to believe that it might be related to my query? I checked the mapping for the field "revenue" and it's "double".
Is it some kind of overflow problem?
Thanks!
Solution which worked for me : Post linked in the comments below
Upvotes: 3
Views: 892
Reputation: 569
So was able to find the answer to it on elasticsearch discussion forum. According to Elasticsearch developer this happens due to dynamic mapping.
This happens only rarely (hence why you only see it on one of your indices) when two shards dynamically map the same field as different types at the same time (one shard may see a double value and map the field to a double whilst the other sees a long value and maps the field to a long). This is a known bug in 1.x and will be fixed in the upcoming 2.0 release (the beta for this release is available now but DO NOT use this in production). To work around this bug you will need to re-index your data into an index with explicit mappings for your fields (especially your numeric fields).
There is also a python module to reindex your mapping. I did it manually in my python
Following is the link of python helper:
http://elasticsearch-py.readthedocs.org/en/latest/helpers.html
Upvotes: 2