Reputation: 4270
I am applying an ngram-filter to my string field:
"custom_ngram": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
But as a result i loose tokens shorter or longer than the ngram range.
Original tokens like "iq" or "a4" for example can not be found.
I am already applying some language specific analysis before ngram, so i would like to avoid copying the whole field. I am looking to expand the tokens with ngrams.
Any ideas or ngram-suggestions?
Here is an example of one of my analyzers that use the custom_ngram filter:
"french": {
"type":"custom",
"tokenizer": "standard",
"filter": [
"french_elision",
"lowercase",
"french_stop",
"custom_ascii_folding",
"french_stemmer",
"custom_ngram"
]
}
Upvotes: 1
Views: 1164
Reputation: 1
I'm not sure if the option existed before. But the solution now is
"custom_ngram": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10,
"preserve_original" : true
}
Upvotes: 0
Reputation: 4270
As Andrei Stefan pointed out, I had to go with multi_fields.
I did and my mapping (for french) now looks like this:
"french_strings": {
"match": "*_fr",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"analyzer": "french",
"fields":{
"ngram":{
"type":"string",
"index":"analyzed",
"analyzer":"ngram",
"search_analyzer": "default_search"
}
}
}
}
I decided to remove the ngram filter from the french analyzer and use an "custom ngram-only" analyzer for the subfield .ngram. This results in a french analyzed field and an "original-to-ngram" subfield.
Upvotes: 0
Reputation: 52368
You have no option than to use multi fields and index that field with a different analyzer that is able to keep the shorter terms as well. Something like that:
"text": {
"type": "string",
"analyzer": "french",
"fields": {
"standard_version": {
"type": "string",
"analyzer": "standard"
}
}
}
And adjust the queries to also touch the text.standard_version
field as well.
Upvotes: 1