Mark
Mark

Reputation: 113

Elasticsearch "max_ngram_diff" working for "edge_ngram" but not for "ngram_tokenizer"

I've got an Elasticsearch index where I've set "max_ngram_diff": 50, but somehow it seems only to be working for the edge_ngram tokenizer but not for the ngram tokenizer.

I've made this two requests against the same URL http://localhost:9201/index-name/_analyze:

Request 1

{
    "tokenizer":
    {
        "type": "edge_ngram",
        "min_gram": 3,
        "max_gram": 20,
        "token_chars": [
            "letter",
            "digit"
        ]
    },
    "text": "1234567890;abcdefghijklmn;"
}

Request 2

{
    "tokenizer": {
        "type": "ngram",
        "min_gram": 3,
        "max_gram": 20,
        "token_chars": [
            "letter",
            "digit"
        ]
    },
    "text": "1234567890;abcdefghijklmn;"
}

The first request returns the expected result:

{
    "tokens": [
        {
            "token": "123",
            "start_offset": 0,
            "end_offset": 3,
            "type": "word",
            "position": 0
        },
        {
            "token": "1234",
            "start_offset": 0,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "12345",
            "start_offset": 0,
            "end_offset": 5,
            "type": "word",
            "position": 2
        },
        {
            "token": "123456",
            "start_offset": 0,
            "end_offset": 6,
            "type": "word",
            "position": 3
        }, 
        // more tokens
    ]
}

But the second request only returns this:

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[ffe18f1a89e6][172.18.0.3:9300][indices:admin/analyze[s]]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [17]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
    },
    "status": 400
}

What happened, that the first request with the edge_ngram tokenizer can have a bigger difference between max_gram and min_gram than 1, but the second request with the ngram tokenizer can't?

This is my mapping:

{
    "settings": {
        "index": {
            "max_ngram_diff": 50,
            // further settings
         }
     }
}

The used elastisearch version is 7.2.0

Thanks for your help!

Upvotes: 1

Views: 4966

Answers (1)

Mark
Mark

Reputation: 113

This behavior is related to the ES version 7.2.0. Everything works as expected when using ES version 7.4.0.

Upvotes: 1

Related Questions