Evaldas Buinauskas
Evaldas Buinauskas

Reputation: 14077

Elasticsearch indexing with elisions

How would I index words, such as L'Oréal in Elasticsearch?

User might type in couple of ways:

  1. Loreal
  2. L'Oreal
  3. L'Oréal

Ideally, I'd like all of them to output loreal. I wouldn't like to do this manually for each exceptional keyword.

Elision Token Filter seems to be useful, but it would work only for 2nd and 3rd cases.

Any ideas how I'd make all of these keywords to output same token loreal?

Upvotes: 0

Views: 178

Answers (1)

Val
Val

Reputation: 217314

The elision token filter will actually remove the specified articles, so you'll never have loreal in your token, i.e. the first l will never make it.

What I suggest is the following using a combination of asciifolding and lowercase:

PUT test_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "my_analyzer": {
               "tokenizer": "standard",
               "filter": [
                  "standard",
                  "asciifolding",
                  "lowercase"
               ],
               "char_filter": [
                  "apostrophe"
               ]
            }
         },
         "char_filter": {
            "apostrophe": {
               "type": "mapping",
               "mappings": [
                  "'=>"
               ]
            }
         }
      }
   }
}

With my_analyzer, all the input strings you've specified will be transformed into the loreal token.

curl -XGET 'localhost:9200/test_index/_analyze?analyzer=my_analyzer&pretty' -d "Loreal"
=> loreal

curl -XGET 'localhost:9200/test_index/_analyze?analyzer=my_analyzer&pretty' -d "L'Oreal"
=> loreal

curl -XGET 'localhost:9200/test_index/_analyze?analyzer=my_analyzer&pretty' -d "L'Oréal"
=> loreal

Upvotes: 1

Related Questions