Phoenix
Phoenix

Reputation: 33

Elasticsearch search with matchQuery using fuzziness and shingle analyzer

I'm working with elasticsearch and coming up with such a problem. I defined an analyzer with type of shingle and create a mapping.

Here's the code:

{
    "settings": {
        "analysis": {
            "char_filter": {
                "icons": {
                    "type": "mapping",
                    "mappings_path": "analysis/char_filter.txt"
                }
            },
            "filter": {
                "synonym_filter": {
                    "type": "synonym",
                    "synonyms_path": "analysis/synonym_filter.txt"
                },
                "shingle_filter":{
                    "type":"shingle",
                    "max_shingle_size": 2,
                    "min_shingle_size": 2,
                    "output_unigrams": true,
                    "token_separator": ""
                }
            },
            "analyzer": {
                "my_analyzer": {
                    "filter": [
                        "lowercase",
                        "synonym_filter",
                        "shingle_filter"
                    ],
                    "char_filter": [
                        "icons"
                    ],
                    "tokenizer": "standard"
                }
            }
        }
    },
    "mappings": {
        "type-0": {
            "properties": {
                "text": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                }
            }
        }
    }
}

And then, I put a document in the index.

{
   "text":"hello"
}

After this I start to search like this:

{
    "query":{
        "match":{
            "text":{
                "query":"hell world",
                "fuzziness":1
            }  
        }
    }
}

but it matches nothing. then I change my query to:

{
    "query":{
        "match":{
            "text":{
                "query":"world hell",
                "fuzziness":1
            }  
        }
    }
}

this request get the document.

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.21576157,
        "hits": [
            {
                "_index": "index-001",
                "_type": "product",
                "_id": "1",
                "_score": 0.21576157,
                "_source": {
                    "text": "hello"
                }
            }
        ]
    }
}

My elasticsearch version is 6.2.4

Anyone can tell me the reason?

condition in kibana

analyze result of hell world

analyze result of hell

analyze result of world hell

analyze result of hello

Upvotes: 3

Views: 1206

Answers (1)

Amit
Amit

Reputation: 32386

fuzziness with a combination of shingle_filter is causing the issue. If you read the note from fuzziness in match query

Fuzzy matching is not applied to terms with synonyms or in cases where the analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.

Pay attention to the bold part, fuzziness is not applied to token at the same position,

now let's inspect the token generated for your search term hell world.

{
    "tokens": [
        {
            "token": "hell",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0 // position 0 for hell
        },
        {
            "token": "hellworld",
            "start_offset": 0,
            "end_offset": 10,
            "type": "shingle",
            "position": 0,  // again position 0 for 
            "positionLength": 2 
        },
        {
            "token": "world",
            "start_offset": 5,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 1    //position 1 
        }
    ]
}

So for position 0 tokens hell and hellworld fuzziness will not be applied hence it doesn't match the index token hello and doesn't return any result.

Now inspect the tokens of world hell

{
    "tokens": [
        {
            "token": "world",
            "start_offset": 0,
            "end_offset": 5,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "worldhell",
            "start_offset": 0,
            "end_offset": 10,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "hell",
            "start_offset": 6,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 1   // this hell position is unique as 1 so it fuzziness will be applied.
        }
    ]
}

Now when you query with world hell, on hell token fuzziness will be applied and it would match the hello indexed tokens and returns the search result.

You can again change the search term to world hell elastic so now hell will not have a unique position, so it won't bring search results again. Hope this will clear your concepts.

Upvotes: 4

Related Questions