Maksim
Maksim

Reputation: 1241

Elasticsearch spell check suggestions even if first letter missed

I create an index like this:

curl --location --request PUT 'http://127.0.0.1:9200/test/' \
--header 'Content-Type: application/json' \
--data-raw '{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "properties" : {
            "word" : { "type" : "text" }
        }
    }
}'

when I create a document:

curl --location --request POST 'http://127.0.0.1:9200/test/_doc/' \
--header 'Content-Type: application/json' \
--data-raw '{ "word":"organic" }'

And finally, search with an intentionally misspelled word:

curl --location --request POST 'http://127.0.0.1:9200/test/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
  "suggest": {
    "001" : {
      "text" : "rganic",
      "term" : {
        "field" : "word"
      }
    }
  }
}'

The word 'organic' lost the first letter - ES never gives suggestion options for such a mispell (works absolutely fine for any other misspells - 'orgnic', 'oragnc' and 'organi'). What am I missing?

Upvotes: 7

Views: 6176

Answers (2)

Talal Humaidi
Talal Humaidi

Reputation: 11

You need to use the CANDIDATE GENERATORS with phrase suggester check this out from Elasticsearch in Action book page 444

Having multiple generators and filters lets you do some neat tricks. For instance, if typos are likely to happen both at the beginning and end of words, you can use multi- ple generators to avoid expensive suggestions with low prefix lengths by using the reverse token filter, as shown in figure F.4. You’ll implement what’s shown in figure F.4 in listing F.4: ■ First, you’ll need an analyzer that includes the reverse token filter.

■ Then you’ll index the correct product description in two fields: one analyzed with the standard analyzer and one with the reverse analyzer.

From Elasticsearch docs

The following example shows a phrase suggest call with two generators: the first one is using a field containing ordinary indexed terms, and the second one uses a field that uses terms indexed with a reverse filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The pre_filter and post_filter options accept ordinary analyzer names.

So you can achieve this by using the reverse analyzer with the post-filter and pre-filter

And as you can see they said:

This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions.

Check this Figure from Elasticsearch In Action book I believe it will make the idea more clear.

A screenshot from the book explains how elastic search will give us the correct phrase

For more information refer to the docs https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html#:~:text=The%20phrase%20suggester%20uses%20candidate,individual%20term%20in%20the%20text.

If explained the full idea then this will be a very long answer but I gave you the key and you can go and do your research about using the phrase suggester with multiple generators.

Upvotes: 1

Emanuil Tolev
Emanuil Tolev

Reputation: 315

This is happening because of the prefix_length parameter: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html . It defaults to 1, i.e. at least 1 letter from the beginning of the term has to match. You can set prefix_length to 0 but this will have performance implications. Only your hardware, your setup and your dataset can show you exactly what those will be in practice in your case, i.e. try it :). However, be careful - Elasticsearch and Lucene devs set the default to 1 for a reason.

Here's a query which for me returns the suggestion result you're after on Elasticsearch 7.4.0 after I perform your setup steps.

curl --location --request POST 'http://127.0.0.1:9200/test/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
  "suggest": {
    "001" : {
      "text" : "rganic",
      "term" : {
        "field" : "word",
        "prefix_length": 0
      }
    }
  }
}'

Upvotes: 6

Related Questions