Vincent
Vincent

Reputation: 630

Elastic search & chewy: make results uniq based on item content

Sorry for the lack of clarity of the title, but I'm not sure on how to express my problem in a simple sentence.

To explain: the application I work on currently has multiple type of objects that can be linked to tags, added by the user. Currently, each tag is a different entry in the database.

For let's say for example I have: - Object 1 tagged with: tag1, tag2 - Object 2 tagged with: tag2, tag3 - Object 3 tagged with: tag1, tag3, tag4

The table for the tags would be something like this:

id | value | tagged object
 1 | tag1  | 1
 2 | tag2  | 1
 3 | tag2  | 2
 4 | tag3  | 2
 5 | tag1  | 3
 6 | tag3  | 3
 7 | tag4  | 3

The values of the tags are also indexed in elastic search (using chewy gem) so the application gives some autocompletion on the tags.

The main problem is that, when searching for 'ta', Elastic search will return the list: tag1, tag2, tag2, tag3, tag1, tag3, tag4 which causes some troubles. Let's imagine we have 100 objects tagged with "tag1" and the 101st is tagged with "tag2". If I search 'ta', "tag2" will not be returned (and so will not be suggested).

What I'd like is for the search query to return: tag1, tag2, tag3, tag4 (I don't really care for the order that said), so basically unify the results based on the value of the indexed tag, not the whole object. I hope that my question is clear enough :)

Thanks in advance :)

As it might help, here's the code used for declaring the index and indexing the elements (and yes there's two fields indexed, but it was already hard to explain with a single one ;) )

class SuggestionsIndex < Chewy::Index
  settings analysis: {
    filter: {
      ngram: {
        type: 'nGram',
        min_gram: 1,
        max_gram: 50,
        token_chars: [ 'letter', 'digit']
      }
    },
    tokenizer: {
      ngram_tokenizer: {
        type: 'nGram',
        min_gram: 1,
        max_gram: 50,
        token_chars: [ 'letter', 'digit', 'punctuation', 'symbol']
      }
    },
    analyzer: {
      # ngram indexing allows searching for a substring in words
      ngram: {
        tokenizer: 'ngram_tokenizer',
        filter: ['lowercase', 'asciifolding']
      },
      # when searching, we search for the lowercase words, not the ngram
      lowerascii_search: {
        tokenizer: 'whitespace',
        filter: ['lowercase', 'asciifolding']
      }
    }
  }

  define_type Tag do
    field :key,
      index_analyzer: 'ngram',
      search_analyzer: 'lowerascii_search'
    field :value,
      index_analyzer: 'ngram',
      search_analyzer: 'lowerascii_search'
    field :project_id, type: 'integer'
  end
end

Upvotes: 0

Views: 292

Answers (1)

devlearn
devlearn

Reputation: 1755

If you want elastic to search on exact values, then either make the fields not_analyzed , either use the keyword tokenizer (instead of ngram) on the type or index level.

Upvotes: 0

Related Questions