Reputation: 634
I am facing below issue in elastic search filter:
When I try to apply "fuzzy_like_this_field" on String value then it's working fine.
But when I apply "fuzzy_like_this_field" filter on different data type other than String(e.g double,Date) it's not working.
It gives
ElasticsearchIllegalArgumentException[fuzzy_like_this_field doesn't support binary/numeric fields.
Please see below elastic search query
{"query": {"bool": {"must": [{"fuzzy_like_this_field": {"Receipts.retailerId": {"like_text": "55f5878916c042cc8731a39e4e05b7a0","fuzziness":0.3}}},{"fuzzy_like_this_field": {"Receipts.totalCost": {"like_text": "10","fuzziness":0.3}}}],"must_not": [],"should": []}},"from": 0,"size": 1000,"sort": [],"facets": {}}
Where retailerId - String and totalCost - double
if I change totalCost data type double to string then it works.
So please suggest any solution?
Upvotes: 0
Views: 458
Reputation: 3209
Fuzzy queries expand text search results to include terms a certain Levenshtein Distance from the query term. They expand numeric values by a margin -fuzziness <= value <= +fuzzyiness (The number of characters needed to be changed or transposed to match) - However, fuzzy_like_this
and fuzzy_like_this_field
only seem to support string matching (via Levenshtein distance).
fuzzy_like_this
and fuzzy_like_this_field
queries are deprecated in ES 1.6+. And they both suffer from performance issues. You should find another method for accomplishing your goal.
There are a number of ways to apply fuzzy matching, but I'm not sure fuzzy matching is what you're after.
By specifying:
"fuzzy_like_this_field":{
"Receipts.retailerId":{
"like_text":"55f5878916c042cc8731a39e4e05b7a0",
"fuzziness":0.3
}
}
You're asking to match all retailerId
s which match the like_text
with up to 22 edits. Edit distance = length(term) * (1.0 - fuzziness) = 32 * 0.7 = 22.4
So in this case 55ddddddd6c0ddddddd1a3dddddddda0
would qualify as a fuzzy match to 55f5878916c042cc8731a39e4e05b7a0
because 10 of the characters share the same position.
If, instead, you're merely looking for duplicate transactions, why not simply use a match query or filter, to match your retailerId
and totalCost
exactly?
"query":{
"bool":{
"must":[
{
"match":{
"Receipts.retailerId": "55f5878916c042cc8731a39e4e05b7a0"
}
},
{
"match":{
"Receipts.totalCost": 10
}
}
]
}
}
Upvotes: 1