Reputation: 630
Sorry for the lack of clarity of the title, but I'm not sure on how to express my problem in a simple sentence.
To explain: the application I work on currently has multiple type of objects that can be linked to tags, added by the user. Currently, each tag is a different entry in the database.
For let's say for example I have: - Object 1 tagged with: tag1, tag2 - Object 2 tagged with: tag2, tag3 - Object 3 tagged with: tag1, tag3, tag4
The table for the tags would be something like this:
id | value | tagged object
1 | tag1 | 1
2 | tag2 | 1
3 | tag2 | 2
4 | tag3 | 2
5 | tag1 | 3
6 | tag3 | 3
7 | tag4 | 3
The values of the tags are also indexed in elastic search (using chewy gem) so the application gives some autocompletion on the tags.
The main problem is that, when searching for 'ta', Elastic search will return the list: tag1, tag2, tag2, tag3, tag1, tag3, tag4 which causes some troubles. Let's imagine we have 100 objects tagged with "tag1" and the 101st is tagged with "tag2". If I search 'ta', "tag2" will not be returned (and so will not be suggested).
What I'd like is for the search query to return: tag1, tag2, tag3, tag4 (I don't really care for the order that said), so basically unify the results based on the value of the indexed tag, not the whole object. I hope that my question is clear enough :)
Thanks in advance :)
As it might help, here's the code used for declaring the index and indexing the elements (and yes there's two fields indexed, but it was already hard to explain with a single one ;) )
class SuggestionsIndex < Chewy::Index
settings analysis: {
filter: {
ngram: {
type: 'nGram',
min_gram: 1,
max_gram: 50,
token_chars: [ 'letter', 'digit']
}
},
tokenizer: {
ngram_tokenizer: {
type: 'nGram',
min_gram: 1,
max_gram: 50,
token_chars: [ 'letter', 'digit', 'punctuation', 'symbol']
}
},
analyzer: {
# ngram indexing allows searching for a substring in words
ngram: {
tokenizer: 'ngram_tokenizer',
filter: ['lowercase', 'asciifolding']
},
# when searching, we search for the lowercase words, not the ngram
lowerascii_search: {
tokenizer: 'whitespace',
filter: ['lowercase', 'asciifolding']
}
}
}
define_type Tag do
field :key,
index_analyzer: 'ngram',
search_analyzer: 'lowerascii_search'
field :value,
index_analyzer: 'ngram',
search_analyzer: 'lowerascii_search'
field :project_id, type: 'integer'
end
end
Upvotes: 0
Views: 292
Reputation: 1755
If you want elastic to search on exact values, then either make the fields not_analyzed , either use the keyword
tokenizer (instead of ngram) on the type or index level.
Upvotes: 0