Avión
Avión

Reputation: 8386

Elasticsearch phrase suggester not working as expected, only gives one good fix

Having the folliwng book indexed:

curl -X PUT localhost:9200/books/book/1 -d '{
    "title": "All Quiet on the Western Front",
    "author": "Erich Maria Remarque",
    "year": 1929,
}'

I'm trying to implement a Phrase Suggester using the code of the official docs.

So I tried;

curl -XPOST 'localhost:9200/books/_search' -d '{
  "suggest" : {
    "text" : "al quet",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 1,
        "real_word_error_likelihood" : 0.95,
        "max_errors" : 0.5,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "title",
          "suggest_mode" : "always",
          "min_word_length" : 1
        } ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}'

I'm expecting this to correct from al quet to all quiet.

But I get the following error:

  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "Analyzer [body] doesn't exists"

If I change "analyzer" : "body" to "analyzer" : "title" I get the same error but with title:

  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "Analyzer [title] doesn't exists"

If I change "analyzer" : "body" to "analyzer" : "default" it doesn't show an error in that line, but it shows an error in the next line. "field" : "bigram",

  "error" : {
     "root_cause" : [ {
       "type" : "illegal_argument_exception",
       "reason" : "No mapping found for field [bigram]"

The only way to make this work is to add: "analyzer" : "default", and "field" : "title",:

curl -XPOST 'localhost:9200/books/_search?pretty=true' -d '{
  "suggest" : {
    "text" : "al quet",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "default",
        "field" : "title",
        "size" : 1,
        "real_word_error_likelihood" : 0.95,
        "max_errors" : 0.5,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "title",
          "suggest_mode" : "always",
          "min_word_length" : 1
        } ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}'

With this I'm getting this output:

 "suggest" : {
    "simple_phrase" : [ {
      "text" : "al quet",
      "offset" : 0,
      "length" : 7,
      "options" : [ {
        "text" : "al quiet",
        "highlighted" : "al <em>quiet</em>",
        "score" : 0.09049256
      } ]
    } ]
  }

As you can see it's correcting quiet but not al, With all my other tries is happening the same, it only corrects one word.

How can I make a succesfull phrase suggester that in the example you input al quet and returns all quiet?

Upvotes: 2

Views: 1678

Answers (1)

ChintanShah25
ChintanShah25

Reputation: 12672

You got the first error because there is no analyzer named body in your index and same with title

The second error is due to missing field bigram, you have only three fields in your index namely title, author, and year.

With your current setup, for suggester to work correctly, you need to give high value for max_errors. From the docs, max_errors is

the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range [0..1) as a fraction of the actual query terms or a number >=1 as an absolute number of query terms. The default is set to 1.0 which corresponds to that only corrections with at most 1 misspelled term are returned. Note that setting this too high can negatively impact performance. Low values like 1 or 2 are recommended otherwise the time spend in suggest calls might exceed the time spend in query execution.

so this should give you desired output.

{
  "suggest": {
    "text": "al quet",
    "simple_phrase": {
      "phrase": {
        "analyzer": "default",
        "field": "title",
        "size": 1,
        "real_word_error_likelihood": 0.95,
        "max_errors": 0.9,  <--- increase this value
        "gram_size": 2,
        "direct_generator": [
          {
            "field": "title",
            "suggest_mode": "always",
            "min_word_length": 1
          }
        ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  },
  "size": 0
}

You might want to use shingles for phrases and collate to get only those results which are in index. I have given detailed answer for this question which might help.

Upvotes: 3

Related Questions