What indexing tokenizer should be used for a Array field using elasticsearch?

Question

I have a keyword field of type Array that is generated on the creation of object. What tokenizer should I use for indexing? I couldn't find the information on elasticsearch.org.

keyword value (array): ['george', 'apple', 'eats', 'new', 'york']

javanna · Accepted Answer

It all depends on your data and what you want to with it. For example, can a keyword be composed of multiple words? If so, do you want a single word to match or not while searching? Also, do you want it to be case-sensitive or not?

If you want to have only exact matches, case-sensitive, you don't even need to analyze the field and you can configure it as index: not_analyzed in your mapping.

If you don't want it to be case-sensitive you can analyze it and use the keyword tokenizer which does no tokenization and the lowercase token filter.

If a keyword can be composed of more than one words and you want every single word to match you need to tokenize it, for example using the whitespace tokenizer or even the default standard analyzer.

What indexing tokenizer should be used for a Array field using elasticsearch?

Answers (1)

Related Questions