Reputation: 189
I am using Elastic 5.4 to implement suggestion/ completion like functionality and facing issue in choosing the right tokenizer for my requirements. Below is example:
There are 4 documents in the index as follows with the content as mentioned below:
DOC 1: Applause
DOC 2: Apple
DOC 3: It is an Apple
DOC 4: Applications
DOC 5: There is_an_appl
Queries
Query 1: Query String 'App' should return all 5 documents.
Query 2: Query String 'Apple' should return only document 2 and document 3.
Query 3: Query String 'Applications' should return only document 4.
Query 4: Query String 'appl' should return all 5 documents.
Tokenizer
I am using the following tokenizer in Elastic and I am seeing all documents returned for Query 2 and Query 3.
The analyzer is applied to fields of type 'text'.
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "ngram",
"min_gram": "3",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
}
}
}
How can I restrict the results to return documents which contain an exact match of the query string either as part of existing word or a phrase or a exact word( I have mentioned the expected results are provided in the queries above)?
Upvotes: 1
Views: 118
Reputation: 217274
That's because you're using an nGram
tokenizer instead of edgeNGram
one. The latter only indexes prefixes, while the former will index prefixes, suffixes and also sub-parts of your data.
Change your analyzer definition to this instead and it should work as expected:
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "edge_ngram", <---- change this
"min_gram": "3",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
}
}
}
Upvotes: 1