Guy Korland
Guy Korland

Reputation: 9568

ElasticSearch completion suggester Standard Analyzer not working

We're using ElasticSearch completion suggester with the Standard Analyzer, but it seems like the text is not tokenized.

e.g.

Texts: "First Example", "Second Example"

Search: "Fi" returns "First Example"

While

Search: "Ex" doesn't return any result returns "First Example"

Upvotes: 3

Views: 3459

Answers (3)

Dainius Jocas
Dainius Jocas

Reputation: 29

One approach to hack in the suggestions from every position of the string could be to shingle the string, take only the shingles with position 0, from every shingle take the last token.

PUT example
{
  "settings": {
    "index.max_shingle_diff": 10,
    "analysis": {
      "filter": {
        "after_last_space": {
          "type": "pattern_replace",
          "pattern": "(.* )",
          "replacement": ""
        },
        "preserve_only_first": {
          "type": "predicate_token_filter",
          "script": {
            "source": "token.position == 0"
          }
        },
        "big_shingling": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 10,
          "output_unigrams": true
        }
      },
      "analyzer": {
        "dark_magic": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "big_shingling",
            "preserve_only_first",
            "after_last_space"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion",
        "analyzer": "dark_magic",
        "search_analyzer": "standard"
      }
    }
  }
}

This hack works for short strings (up to 10 tokens in the example).

Upvotes: 0

M.Vanderlee
M.Vanderlee

Reputation: 2996

A great work around is to tokenize the string yourself and put it in a separate tokens field. You can then use 2 suggestions in your suggest query to search both fields.

Example:

PUT /example
{
    "mappings": {
        "doc": {
            "properties": {
                "full": {
                    "type": "completion"
                },
                "tokens": {
                    "type": "completion"
                }
            }
        }
    }
}

POST /example/doc/_bulk
{ "index":{} }
{"full": {"input": "First Example"}, "tokens": {"input": ["First", "Example"]}}
{ "index":{} }
{"full": {"input": "Second Example"}, "tokens": {"input": ["Second", "Example"]}}

POST /example/_search
{
    "suggest": {
        "full-suggestion": {
            "prefix" : "Ex", 
            "completion" : { 
                "field" : "full",
                "fuzzy": true
            }
        },
        "token-suggestion": {
            "prefix": "Ex",
            "completion" : { 
                "field" : "tokens",
                "fuzzy": true
            }
        }
    }
}

Search result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "suggest": {
    "full-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": []
      }
    ],
    "token-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": [
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "Ikvk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "First Example"
              },
              "tokens": {
                "input": [
                  "First",
                  "Example"
                ]
              }
            }
          },
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "I0vk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "Second Example"
              },
              "tokens": {
                "input": [
                  "Second",
                  "Example"
                ]
              }
            }
          }
        ]
      }
    ]
  }
}

Upvotes: 1

Trong Lam Phan
Trong Lam Phan

Reputation: 2412

As the doc of Elastic about completion suggester: Completion Suggester

The completion suggester is a so-called prefix suggester.

So when you send a keyword, it will look for the prefix of your texts.

E.g:

Search: "Fi" => "First Example"

Search: "Sec" => "Second Example"

but if you give Elastic "Ex", it returns nothing because it cannot find a text which begins with "Ex".

You can try some others suggesters like: Term Suggester

Upvotes: 3

Related Questions