Indexing lists of keywords in alphabetical order VS not sorting at all in elasticsearch?

Question

I'm using elasticsearch to store lists of keywords with the standard analyzer, like this:

{
   id:1,
   body_color:'silver,blue',
   feature:'wifi,gps'  
},
{
   id:2,
   body_color:'blue,red',
   window_color:'yellow,white',
   feature:'multi core,wifi'
}

Does sorting these lists in alphabetical order, e.g)

{
   id:1,
   body_color:'blue,silver',
   feature:'gps,wifi'  
},
{
   id:2,
   body_color:'blue,red',
   window_color:'white,yellow',
   feature:'multi core,wifi'
}

require a smaller index size because of the more standard and less variations? Does it help for normalizing the tokens?

mel · Accepted Answer

First as you said those are keywords and not texts. Then should use the type keyword instead of the type text, keyword are not analysed.

Your document then should look like:

{
   id:1,
   body_color:'silver,blue',
   feature:['wifi','gps']  
},
{
   id:2,
   body_color:'blue,red',
   window_color:['yellow','white'],
   feature:'multi core,wifi'
}

For your question concerning the sorting. When elasticsearch analysed a string he is applying the following:

Character filters
Tokenizer
Token filters

The character filter will remove characters that you don't want to index like HTML tag for example. After this the tokeniser will be apply on the remaining string, it will divide your string in a list of tokens. The last step, token filters, will remove certain token in the list, stop words for example then every token will be added to the inverted index, which will make them searchable.

I don't believe that sorting your keyword will improve the efficiency of the indexation.

Indexing lists of keywords in alphabetical order VS not sorting at all in elasticsearch?

Answers (1)

Related Questions