Reputation: 41745
In korean, a city name can have a suffix attached to it.
It's like Newyorkcity
People use either Newyork
or Newyorkcity
I'd like to create analyzers (index/search) so that when people search for either newyork
or newyorkcity
, I could give all the newyork related documents.
I was looking at pattern
tokenizer and thought I could make this with
"tokenizer": ["whitespace", "my_pattern_tokenizer"]
But then, found out you could have only one tokenizer in an analyzer.
How to achieve what I want?
Upvotes: 1
Views: 340
Reputation: 749
PUT index_name
{
"mappings": {
"_doc": {
"properties": {
"city": {
"type": "text", "analyzer": "ngram_analyzer",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"ngram_tokenizer": {
"token_chars": ["letter", "digit"],
"min_gram": 3
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer"
}
}
}
}
}
Search for Newyork or Newyorkcity
GET index_name/_search
{
"query": {
"match": {
"city": "Newyork"
}
}
}
GET index_name/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"city": "Newyorkcity"
}
},
{
"match": {
"city.raw": "Newyorkcity"
}
}
]
}
}
}
Upvotes: 0
Reputation: 22316
I don't recommend using ngram_analyzer
as the results can be unstable as well as the massive data redundancy.
Your idea is on the right track, here is how I would do it:
Start by creating a custom analyzer using pattern char filter:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": 'custom',
"tokenizer": 'whitespace',
"filter": ['lowercase'],
"char_filter": ["my_char_replace"]
}
}
"char_filter": {
"my_city_char_filter": {
"type": "pattern_replace",
"pattern": "city",
"replacement": ""
}
}
}
}
}
In your mapping:
"city": {
"type": "keyword",
'analyzer': "my_analyzer"
}
}
Now your data is ready to be queried simply using:
GET index/_search
{
"query": {
"bool": {
"match": {
"city": query
}
}
}
}
Upvotes: 1