Ruslan F.
Ruslan F.

Reputation: 5776

How to preserve original term during transliteration in Elasticsearch with ICU plugin?

I'm using the folowing ICU transform filter to peform transliteration

"transliterate": {
    "type": "icu_transform",
    "id": "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC"
}

Current problem is that this filter replace the original term in index so search in native language is not possible with term query like this

{                
   "terms" : {    
     "field" : [   
       "term"    
     ],           
     "boost" : 1.0
   }              
 }

Is there any way to make icu_transform filter produce 2 terms original one and transliterated one?

If no i think the optimal solution will be maping with copy to another field and analyzer for this field without transliterate filter. Can you suggest smth more efficient?

I'm using Elasticsearch 5.6.4

Upvotes: 1

Views: 646

Answers (1)

Chin Huang
Chin Huang

Reputation: 13810

Multi-fields allow you to index the same source value to different fields in different ways. You can index to a field with the standard analyzer and to another field with an analyzer that applies the ICU transform filter. For example,

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_field": {
          "type": "text",
          "fields": {
            "latin": {
              "type": "text",
              "analyzer": "latin"
            }
          }
        }
      }
    }
  }
}

Then you can query the my_field or my_field.latin field.

Upvotes: 1

Related Questions