Reputation: 572
I needed partial search in my website. Initially I used edgeNgramFeild directly it didn't work as expected. So I used custom search engine with custom analyzers.I am using Django-haystack.
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"asciifolding"
]
},
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
Used edgengram_analyzer
for indexing and suggest_analyzer
for search. This worked for some extent. But,it doesn't work for numbers for example when 30 is entered it doesn't search for 303 and also with words containing alphabet and numbers combined. So I searched for various sites.
They suggested to use standard or whitespace
tokenizer and with haystack_edgengram
filter. But it didn't work at all, putting aside number partial search didn't work even for alphabet. The settings after the suggestion:
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "whitepsace",
"filter": ["haystack_edgengram"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"asciifolding"
]
},
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
Does anything other than lowercase
tokenizer work with django-haystack? or haystack_edgengram
filter not working for me. According my knowledge it should work like this. Considering 2 Lazy Dog
as text supplied. it should get tokens like this with whitespace
[2,Lazy,Dog]
. and then applying haystack_edgengram
filter it should generate tokens [2,la,laz,lazy,do,dog]
.its not working like this.Did i do something wrong?
My requirement is for example for text 2 Lazy Dog
when some one types 2 Laz
it should work.
Edited:
In my assumption the lowercase tokenizer worked properly. But, in case of above text it will omit 2
and creates token [lazy,dog]
. Why can't standard or whitespace tokenizer work?
Upvotes: 0
Views: 922
Reputation: 572
Found the answer myself and with @jgr's suggestion:
ELASTICSEARCH_INDEX_SETTINGS = {
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["haystack_edgengram","lowercase"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"lowercase"
]
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 1,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
}
}
}
}
}
ELASTICSEARCH_DEFAULT_ANALYZER = "suggest_analyzer"
Upvotes: 0
Reputation: 2931
In ngrams filter you define min_gram which is minimum length of created tokens. In your case '2' has length: 1 so this is ignored in ngram filters.
The easiest way to fix this is to change min_gram to 1. A bit more complicated way can be to combine some standard analyzer to match whole keyword (useful for shorter terms) and ngram analzyer for partial matching (for longer terms) - maybe with some bool queries.
You can also change ngrams to start from '1' characters but in your search box require at least 3 letters before send query to Elasticsearch.
Upvotes: 2