Reputation: 113
I've got an Elasticsearch index where I've set "max_ngram_diff": 50
, but somehow it seems only to be working for the edge_ngram
tokenizer but not for the ngram
tokenizer.
I've made this two requests against the same URL http://localhost:9201/index-name/_analyze
:
Request 1
{
"tokenizer":
{
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
Request 2
{
"tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
The first request returns the expected result:
{
"tokens": [
{
"token": "123",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "1234",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "12345",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 2
},
{
"token": "123456",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 3
},
// more tokens
]
}
But the second request only returns this:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[ffe18f1a89e6][172.18.0.3:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [17]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
},
"status": 400
}
What happened, that the first request with the edge_ngram
tokenizer can have a bigger difference between max_gram
and min_gram
than 1
, but the second request with the ngram
tokenizer can't?
This is my mapping:
{
"settings": {
"index": {
"max_ngram_diff": 50,
// further settings
}
}
}
The used elastisearch version is 7.2.0
Thanks for your help!
Upvotes: 1
Views: 4966
Reputation: 113
This behavior is related to the ES version 7.2.0. Everything works as expected when using ES version 7.4.0.
Upvotes: 1