user3784881
user3784881

Reputation: 41

Autocomplete functionality using elastic search

I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:

mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4

Usecase:

My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.

I investigated 2 ways for doing that but non of the methods seem to address the usecase above:

1) Suggester autocomplete:

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html

With this it seems I have to add another field called "suggest" replicating the data which is not desirable.

2) using a prefix filter/query:

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html

this gives the whole document back not the exact matching terms.

Is there a clean way of achieving this, please advise.

Upvotes: 0

Views: 2356

Answers (4)

papierkorp
papierkorp

Reputation: 329

I created this table for myself:

UseCase Completion S. Context S. Term S. Phrase S. search_as_you_type Edge N-Gram
Basic Auto-Complete X X X X
Flexible Search/Query X X
High Performace for Large Datasets X X X X
Higher Memory Usage X X X
Higher Storage Usage X X
Substring Matches X X
Dynamic Data Updates X X X X
Relevance Scoring X X X X
Spell Correction X X
complexity to implement low high medium high low medium
Speciality fast prefix matching context-aware suggestions single term corrections multi term corrections implements edge n-gram, full text partial matching

differentiate between Query Suggestion and Search

References

ever since the author asked, the search_as_you_type field was implemented which is exactly what author would have needed back then :D

Upvotes: 0

suresh chaudhari
suresh chaudhari

Reputation: 9

you can use lowercase filter for the elastic index.THis will help you to search upper case letters as well.

Create doc using below settings

   PUT lowercase_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "whitespace_lowercase": {
          "tokenizer": "whitespace",
          "filter": [ "lowercase" ]
        }
      }
    }
  },
 "mappings": {
    "properties": {
      "field1": { "type": "text" }
    }
  }
}

Now when you search you will get both of the fields included irrespective of lowercase and upper case

Upvotes: 0

ChintanShah25
ChintanShah25

Reputation: 12672

I think completion suggester would be the cleanest way but if that is undesirable you could use aggregations on name field.

This is a sample index(I am assuming you are using ES 1.7 from your question

PUT netflix
{
  "settings": {
    "analysis": {
      "analyzer": {
        "prefix_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "trim",
            "edge_filter"
          ]
        },
        "keyword_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "trim"
          ]
        }
      },
      "filter": {
        "edge_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      }
    }
  },
  "mappings": {
    "movie":{
      "properties": {
        "name":{
          "type": "string",
          "fields": {
            "prefix":{
            "type":"string",
            "index_analyzer" : "prefix_analyzer",
            "search_analyzer" : "keyword_analyzer"
            },
            "raw":{
              "type": "string",
              "analyzer": "keyword_analyzer"
            }
          }
        },
        "tags":{
          "type": "string", "index": "not_analyzed"
        }
      }
    }
  }
}

Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.

The following query will give top 10 suggestions.

GET netflix/movie/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "tags": "sci-fi"
        }
      },
      "query": {
        "match": {
          "name.prefix": "sta"
        }
      }
    }
  },
  "size": 0,
  "aggs": {
    "unique_movie_name": {
      "terms": {
        "field": "name.raw",
        "size": 10
      }
    }
  }
}

Results will be something like

"aggregations": {
      "unique_movie_name": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "star trek",
               "doc_count": 1
            },
            {
               "key": "star wars",
               "doc_count": 1
            }
         ]
      }
   }

UPDATE :

You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.

{
  "query": {
    "query_string": {
      "query": "sta*"
    }
  },
  "_source": false,
  "highlight": {
    "fields": {
      "*": {}
    }
  }
}

Upvotes: 1

vinod_vh
vinod_vh

Reputation: 1061

Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.

GET /netflix/movie/_search
{
"query": {
    "query_string": {
        "query": "sta*"
    }
  }
}

Upvotes: 1

Related Questions