Reputation: 3580
I have a query like below and when date_partition field is "type" => "float" it returns queries like 20220109, 20220108, 20220107. When field "type" => "long", it only returns 20220109 query. Which is what I want.
Each queries below, the result is returned as if the query 20220119 was sent. --> 20220109, 20220108, 20220107
PUT date
{
"mappings": {
"properties": {
"date_partition_float": {
"type": "float"
},
"date_partition_long": {
"type": "long"
}
}
}
}
POST date/_doc
{
"date_partition_float": "20220109",
"date_partition_long": "20220109"
}
#its return the query
GET date/_search
{
"query": {
"match": {
"date_partition_float": "20220108"
}
}
}
#nothing return
GET date/_search
{
"query": {
"match": {
"date_partition_long": "20220108"
}
}
}
Is this a bug or is this how float type works ? 2 years of data loaded to Elasticsearch (like day-1, day-2) (20 gb pri shard size per day)(total 15 TB) what is the best way to change the type of just this field ? I have 5 float type in my mapping, what is the fastest way to change all of them. Note: In my mind I have below solutions but I'm afraid it's slow
Upvotes: 1
Views: 1059
Reputation: 3580
Here is the answer to my question => https://discuss.elastic.co/t/elasticsearch-data-type-float-returns-incorrect-results/300335
You're running into some java quirks (built as intended however) here. If you want to reproduce, run jshell locally and type in this
Float.valueOf(20220109.0f); the result will return 2.0220108E7 due to rounding issues with floating point values, as they are not stored exactly.
You can use the reindex functionality to reindex your data into an index with the mapping fixed (you could also add new fields to the existing index and use update-by-query, but I am not sure that is clean).
Upvotes: 1
Reputation: 217324
That date_partition
field should have the date
type with format=yyyyMMdd
, that's the only sensible type to use, not long
and even worse float
.
PUT date
{
"mappings": {
"properties": {
"date_partition": {
"type": "date",
"format": "yyyyMMdd"
}
}
}
}
It's not logical to query for 20220108
and have the 20220109
document returned in the results.
Using the date
type would also allow you to use proper time-based range
queries and create date_histogram
aggregations on your data.
You can either recreate the index with the adequate type and reindex your data, or add a new field to your existing index and update it by query. Both options are valid.
Upvotes: 1