Nariman Esmaiely Fard
Nariman Esmaiely Fard

Reputation: 615

Unexpected result from elasticsearch Phrase suggester when first letter is misspelled

I'm using Elasticsearch Phrase Suggester for correcting user's misspellings. everything is working as I expected unless user enters a query which it's first letter is misspelled. At this situation phrase suggester returns nothing or returns unexpected results.

My documents and query are exactly the same with examples of phrase suggester:

POST test/test?refresh=true
{"title": "noble warriors"}
POST test/test?refresh=true
{"title": "nobel prize"}

    POST test/_search
{
  "suggest": {
    "text": "noble prize",
    "simple_phrase": {
      "phrase": {
        "field": "title.trigram",
        "size": 1,
        "gram_size": 3,
        "direct_generator": [ {
          "field": "title.trigram",
          "suggest_mode": "always"
        } ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

Example when first letter is misspelled:

   {
  "_shards": ...
  "hits": ...
  "timed_out": false,
  "took": 3,
  "suggest": {
    "simple_phrase" : [
      {
        "text" : "mobel prize",
        "offset" : 0,
        "length" : 11,
        "options" : []
      }
    ]
  }
}

Example when 4th letter is misspelled:

{
  "_shards": ...
  "hits": ...
  "timed_out": false,
  "took": 3,
  "suggest": {
    "simple_phrase" : [
      {
        "text" : "noble prize",
        "offset" : 0,
        "length" : 11,
        "options" : [ {
          "text" : "nobel prize",
          "highlighted": "<em>nobel</em> prize",
          "score" : 0.5962314
        }]
      }
    ]
  }
}

Upvotes: 0

Views: 79

Answers (1)

femtoRgon
femtoRgon

Reputation: 33341

Change your generator's prefix length.

prefix_length

The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don’t occur in the beginning of terms. (Old name "prefix_len" is deprecated)

A warning, generally setting prefix length on fuzzy matching queries like this to zero tends to have a pretty significant adverse affect on performance.

Upvotes: 1

Related Questions