Reputation: 14077
How would I index words, such as L'Oréal in Elasticsearch?
User might type in couple of ways:
Ideally, I'd like all of them to output loreal. I wouldn't like to do this manually for each exceptional keyword.
Elision Token Filter seems to be useful, but it would work only for 2nd and 3rd cases.
Any ideas how I'd make all of these keywords to output same token loreal
?
Upvotes: 0
Views: 178
Reputation: 217314
The elision token filter will actually remove the specified articles, so you'll never have loreal
in your token, i.e. the first l
will never make it.
What I suggest is the following using a combination of asciifolding
and lowercase
:
PUT test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"asciifolding",
"lowercase"
],
"char_filter": [
"apostrophe"
]
}
},
"char_filter": {
"apostrophe": {
"type": "mapping",
"mappings": [
"'=>"
]
}
}
}
}
}
With my_analyzer
, all the input strings you've specified will be transformed into the loreal
token.
curl -XGET 'localhost:9200/test_index/_analyze?analyzer=my_analyzer&pretty' -d "Loreal"
=> loreal
curl -XGET 'localhost:9200/test_index/_analyze?analyzer=my_analyzer&pretty' -d "L'Oreal"
=> loreal
curl -XGET 'localhost:9200/test_index/_analyze?analyzer=my_analyzer&pretty' -d "L'Oréal"
=> loreal
Upvotes: 1