Reputation: 2001
Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.
Please find my below query that am running in kibana
GET index_name/_search
{
"query": {
"match" : {
"values" : {
"query" : "A661752110",
"operator" : "and",
"fuzziness": 1,
"boost": 1.0,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Am expecting results as below :
A661752110
A66175211012
A661752110111
A661752110-12
A661752110-111
But am getting results like :
A661752110
A661752111
A661752119
Please find my mapping details :
PUT index_name
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"id":{
"type": "text"
}
}
}
}
}
Upvotes: 0
Views: 989
Reputation: 858
Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:
The edit distance is controlled in the search request with the fuzziness
parameter. You specified a fuzziness
of 1
which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".
The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.
Some suggestions to achieve what you want:
A661752110-12
and A661752110-111
to match. You can use a tokenizer that splits text when it finds a -
. This is what the standard tokenizer does for example. A66175211012
and A661752110111
, the best choice will be to use a regexp query like this{ "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }
Upvotes: 1