Ranjeet
Ranjeet

Reputation: 634

Elastic Search : "fuzzy_like_this_field" filter query is not working

I am facing below issue in elastic search filter:

When I try to apply "fuzzy_like_this_field" on String value then it's working fine.

But when I apply "fuzzy_like_this_field" filter on different data type other than String(e.g double,Date) it's not working.

It gives

ElasticsearchIllegalArgumentException[fuzzy_like_this_field doesn't support binary/numeric fields.

Please see below elastic search query

{"query": {"bool": {"must": [{"fuzzy_like_this_field": {"Receipts.retailerId": {"like_text": "55f5878916c042cc8731a39e4e05b7a0","fuzziness":0.3}}},{"fuzzy_like_this_field": {"Receipts.totalCost": {"like_text": "10","fuzziness":0.3}}}],"must_not": [],"should": []}},"from": 0,"size": 1000,"sort": [],"facets": {}}

Where retailerId - String and totalCost - double

if I change totalCost data type double to string then it works.

So please suggest any solution?

Upvotes: 0

Views: 458

Answers (1)

Peter Dixon-Moses
Peter Dixon-Moses

Reputation: 3209

Fuzzy queries expand text search results to include terms a certain Levenshtein Distance from the query term. They expand numeric values by a margin -fuzziness <= value <= +fuzzyiness (The number of characters needed to be changed or transposed to match) - However, fuzzy_like_this and fuzzy_like_this_field only seem to support string matching (via Levenshtein distance).


fuzzy_like_this and fuzzy_like_this_field queries are deprecated in ES 1.6+. And they both suffer from performance issues. You should find another method for accomplishing your goal.

There are a number of ways to apply fuzzy matching, but I'm not sure fuzzy matching is what you're after.

By specifying:

"fuzzy_like_this_field":{  
                  "Receipts.retailerId":{  
                     "like_text":"55f5878916c042cc8731a39e4e05b7a0",
                     "fuzziness":0.3
                  }
               }

You're asking to match all retailerIds which match the like_text with up to 22 edits. Edit distance = length(term) * (1.0 - fuzziness) = 32 * 0.7 = 22.4

So in this case 55ddddddd6c0ddddddd1a3dddddddda0 would qualify as a fuzzy match to 55f5878916c042cc8731a39e4e05b7a0 because 10 of the characters share the same position.


If, instead, you're merely looking for duplicate transactions, why not simply use a match query or filter, to match your retailerId and totalCost exactly?

"query":{  
      "bool":{  
         "must":[  
            {  
               "match":{  
                  "Receipts.retailerId": "55f5878916c042cc8731a39e4e05b7a0" 
               }
            },
            {  
               "match":{  
                  "Receipts.totalCost": 10
               }
            }
         ]
      }
   }

Upvotes: 1

Related Questions