Elastic search & chewy: make results uniq based on item content

Question

Sorry for the lack of clarity of the title, but I'm not sure on how to express my problem in a simple sentence.

To explain: the application I work on currently has multiple type of objects that can be linked to tags, added by the user. Currently, each tag is a different entry in the database.

For let's say for example I have: - Object 1 tagged with: tag1, tag2 - Object 2 tagged with: tag2, tag3 - Object 3 tagged with: tag1, tag3, tag4

The table for the tags would be something like this:

id | value | tagged object
 1 | tag1  | 1
 2 | tag2  | 1
 3 | tag2  | 2
 4 | tag3  | 2
 5 | tag1  | 3
 6 | tag3  | 3
 7 | tag4  | 3

The values of the tags are also indexed in elastic search (using chewy gem) so the application gives some autocompletion on the tags.

The main problem is that, when searching for 'ta', Elastic search will return the list: tag1, tag2, tag2, tag3, tag1, tag3, tag4 which causes some troubles. Let's imagine we have 100 objects tagged with "tag1" and the 101st is tagged with "tag2". If I search 'ta', "tag2" will not be returned (and so will not be suggested).

What I'd like is for the search query to return: tag1, tag2, tag3, tag4 (I don't really care for the order that said), so basically unify the results based on the value of the indexed tag, not the whole object. I hope that my question is clear enough :)

Thanks in advance :)

As it might help, here's the code used for declaring the index and indexing the elements (and yes there's two fields indexed, but it was already hard to explain with a single one ;) )

class SuggestionsIndex < Chewy::Index
  settings analysis: {
    filter: {
      ngram: {
        type: 'nGram',
        min_gram: 1,
        max_gram: 50,
        token_chars: [ 'letter', 'digit']
      }
    },
    tokenizer: {
      ngram_tokenizer: {
        type: 'nGram',
        min_gram: 1,
        max_gram: 50,
        token_chars: [ 'letter', 'digit', 'punctuation', 'symbol']
      }
    },
    analyzer: {
      # ngram indexing allows searching for a substring in words
      ngram: {
        tokenizer: 'ngram_tokenizer',
        filter: ['lowercase', 'asciifolding']
      },
      # when searching, we search for the lowercase words, not the ngram
      lowerascii_search: {
        tokenizer: 'whitespace',
        filter: ['lowercase', 'asciifolding']
      }
    }
  }

  define_type Tag do
    field :key,
      index_analyzer: 'ngram',
      search_analyzer: 'lowerascii_search'
    field :value,
      index_analyzer: 'ngram',
      search_analyzer: 'lowerascii_search'
    field :project_id, type: 'integer'
  end
end

Elastic search & chewy: make results uniq based on item content

Answers (1)

Related Questions

Elastic search &amp; chewy: make results uniq based on item content

Answers (1)

Related Questions

Elastic search & chewy: make results uniq based on item content