Reputation: 41
I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:
mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4
Usecase:
My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.
I investigated 2 ways for doing that but non of the methods seem to address the usecase above:
1) Suggester autocomplete:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html
With this it seems I have to add another field called "suggest" replicating the data which is not desirable.
2) using a prefix filter/query:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html
this gives the whole document back not the exact matching terms.
Is there a clean way of achieving this, please advise.
Upvotes: 0
Views: 2356
Reputation: 329
I created this table for myself:
UseCase | Completion S. | Context S. | Term S. | Phrase S. | search_as_you_type | Edge N-Gram |
---|---|---|---|---|---|---|
Basic Auto-Complete | X | X | X | X | ||
Flexible Search/Query | X | X | ||||
High Performace for Large Datasets | X | X | X | X | ||
Higher Memory Usage | X | X | X | |||
Higher Storage Usage | X | X | ||||
Substring Matches | X | X | ||||
Dynamic Data Updates | X | X | X | X | ||
Relevance Scoring | X | X | X | X | ||
Spell Correction | X | X | ||||
complexity to implement | low | high | medium | high | low | medium |
Speciality | fast prefix matching | context-aware suggestions | single term corrections | multi term corrections | implements edge n-gram, full text partial matching |
differentiate between Query Suggestion and Search
References
ever since the author asked, the search_as_you_type
field was implemented which is exactly what author would have needed back then :D
Upvotes: 0
Reputation: 9
you can use lowercase filter for the elastic index.THis will help you to search upper case letters as well.
Create doc using below settings
PUT lowercase_example
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_lowercase": {
"tokenizer": "whitespace",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"properties": {
"field1": { "type": "text" }
}
}
}
Now when you search you will get both of the fields included irrespective of lowercase and upper case
Upvotes: 0
Reputation: 12672
I think completion suggester
would be the cleanest way but if that is undesirable you could use aggregations on name field.
This is a sample index(I am assuming you are using ES 1.7 from your question
PUT netflix
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"edge_filter"
]
},
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
},
"filter": {
"edge_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
}
}
},
"mappings": {
"movie":{
"properties": {
"name":{
"type": "string",
"fields": {
"prefix":{
"type":"string",
"index_analyzer" : "prefix_analyzer",
"search_analyzer" : "keyword_analyzer"
},
"raw":{
"type": "string",
"analyzer": "keyword_analyzer"
}
}
},
"tags":{
"type": "string", "index": "not_analyzed"
}
}
}
}
}
Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.
The following query will give top 10 suggestions.
GET netflix/movie/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sci-fi"
}
},
"query": {
"match": {
"name.prefix": "sta"
}
}
}
},
"size": 0,
"aggs": {
"unique_movie_name": {
"terms": {
"field": "name.raw",
"size": 10
}
}
}
}
Results will be something like
"aggregations": {
"unique_movie_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "star trek",
"doc_count": 1
},
{
"key": "star wars",
"doc_count": 1
}
]
}
}
UPDATE :
You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.
{
"query": {
"query_string": {
"query": "sta*"
}
},
"_source": false,
"highlight": {
"fields": {
"*": {}
}
}
}
Upvotes: 1
Reputation: 1061
Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.
GET /netflix/movie/_search
{
"query": {
"query_string": {
"query": "sta*"
}
}
}
Upvotes: 1