Reputation: 167
I am using an edge ngram tokenizer in order to provide partial matching. My documents look like
Name
Labson series LTD 2014
Labson PLO LTD 2014A
Labson PLO LTD 2014-I
Labson PLO LTD. 2014-II
My mapping is as follows
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 40,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
PUT my_index/doc/1
{
"title": "Labson Series LTD 2014"
}
PUT my_index/doc/2
{
"title": "Labson PLO LTD 2014A"
}
PUT my_index/doc/3
{
"title": "Labson PLO LTD 2014-I"
}
PUT my_index/doc/4
{
"title": "Labson PLO LTD. 2014-II"
}
The following query gives me 3 documents which is correct (Labson PLO LTD 2014A
, Labson PLO LTD 2014-I
, Labson PLO LTD. 2014-II
)
GET my_index/_search
{
"query": {
"match": {
"title": {
"query": "labson plo",
"operator": "and"
}
}
}
}
But when I type in Labson PLO 2014A
it gives me 0 documents
GET my_index/_search
{
"query": {
"match": {
"title": {
"query": "Labson PLO 2014A",
"operator": "and"
}
}
}
}
I expect this to return 1 document Labson PLO LTD 2014A
, but for some reason it seems like it is not indexing the digits in token. Let me know if i am missing thing anything over here.
Upvotes: 0
Views: 261
Reputation: 3937
In your autocomplete_search
you are using lowercase
tokeinzer.
which performs the function of Letter Tokenizer
and Lower Case Token Filter
together .
https://www.elastic.co/guide/en/elasticsearch/reference/2.3//analysis-lowercase-tokenizer.html
Now lets see what Letter Tokenizer
does.
The letter tokenizer breaks text into terms whenever it encounters a character which is not a letter.
https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-letter-tokenizer.html
so in your case when you queried.
"query": "Labson PLO 2014A",
The query actually become
"+title:labson +title:plo +title:a"
since the letter tokenizer dropped 2014. now your index tokens does not contain a token with just the letter a
. that is why you did not get any results back.
you can analyze your query like this in kibana
POST my_index/_validate/query?explain
{
"query": {
"match": {
"title": {
"query": "Labson PLO 2014a",
"operator": "and"
}
}
}
}
and you will see 2014 is getting dropped. from the final query.
Also to see what the letter tokenizer produces use the following query
POST _analyze
{
"tokenizer": "letter",
"text": "Labson PLO LTD 2014a"
}
Upvotes: 1