Simon Steinberger
Simon Steinberger

Reputation: 6825

ElasticSearch term suggest on analyzed field returns no suggestions

I'd like to use ElasticSearch term suggest feature for spelling corrections (Did you mean ...?). Here's the official documentation:

Here's my (shortened to the basics) scheme:

{
    "settings": {
        "analysis": {
            "filter": {
                "en_stop_filter": { "type": "stop", "stopwords": ["_english_"] },
                "en_stem_filter": { "type": "stemmer", "name": "minimal_english" },
                "de_stop_filter": { "type": "stop", "stopwords": ["_german_"] },
                "de_stem_filter": { "type": "stemmer", "name": "minimal_german" }
            },
            "analyzer": {
                "en_analyzer": { "type": "custom", "tokenizer": "icu_tokenizer", "filter": ["icu_folding", "icu_normalizer", "en_stop_filter", "en_stem_filter"] },
                "de_analyzer": { "type": "custom", "tokenizer": "icu_tokenizer", "filter": ["icu_folding", "icu_normalizer", "de_stop_filter", "de_stem_filter"] }
            }
        }
    },
    "mappings": {
        "blog": {
            "_analyzer": { "path": "my_analyzer", "index": "no" },
            "properties": {
                "title": { "type": "string" },
                "my_analyzer": { "type": "string", "index": "no" }
            }
        },
        "photo": {
            "properties": {
                "tags_en": { "type": "string", "analyzer": "en_analyzer", "index_name": "tag_en" }
                "tags_de": { "type": "string", "analyzer": "de_analyzer", "index_name": "tag_de" }
            }
        }
    }
}

And that's how data in indexed via Python/Django for a) our blog:

data = ''
for i, p in enumerate(BlogPost.objects.all()):
    data += '{"index": {"_id": "%s"}}\n' % p.pk
    data += json.dumps({ "my_analyzer": p.language+"_analyzer", "title": p.title })+'\n'
resp = requests.put(ELASTICSEARCH_URL+'blog/_bulk', data=data)

I'm setting the analyzer according to the language of each blog post (p.language = 'de' or 'en'), either German or English.

I'm able to search this index (via Python) and I do get spelling suggestions returned with these params:

{
  "query": {
    "query_string": {
      "query": q,
      "analyzer": "en_analyzer"
    }
  },
  "suggest": {
    "my_suggestion": {
      "text": q,
      "term": {
        "size": 1,
        "field": "title"
      }
    }
  }
}

However, what I really need, are spelling suggestions for searches on our photo scheme, which is indexed by this (Python/Django):

for p in Photo.objects.all():
    data += '{"index": {"_id": "%s"}}\n' % p.pk
    data += json.dumps({
        "tags_cs": p.tags_en,
        "tags_de": p.tags_de
    })+'\n'
resp = requests.put(ELASTICSEARCH_URL+'photo/_bulk', data=data)

p.tags_en and p.tags_de may be indexed as comma-separated tag strings, or as actual lists of strings. Both work for ElasticSearch and it doesn't seem to make a difference for this problem.

Searching photos works, both in English and German, but no spelling suggestions ever get returned:

{
  "query": {
    "query_string": {
      "query": q,
      "fields": [
        "tags_en"
      ],
      "analyzer": "en_analyzer"
    }
  },
  "suggest": {
    "my_suggestion": {
      "text": q,
      "term": {
        "size": 1,
        "field": "tags_en"
      }
    }
  }
}

It doesn't make a difference if I define an analyzer for the suggestion term, like this:

{
  "query": {
    "query_string": {
      "query": q,
      "fields": [
        "tags_en"
      ],
      "analyzer": "en_analyzer"
    }
  },
  "suggest": {
    "my_suggestion": {
      "text": q,
      "term": {
        "size": 1,
        "field": "tags_en",
        "analyzer": "en_analyzer"
      }
    }
  }
}

Note the difference in analyzing between blog posts and photos: Our blog posts get analyzed in one language per post. via the my_analyzer field in the scheme. Our photos, however, are analyzed on a per-field basis. We do have 20 languages (only two are shown here to keep code as small as possible) and each tag-field is analyzed accordingly. If I remove this type of analyzation for photos, I also get suggestions there, but we really do need the field-based analyzers.

So the issue must have something to do with the analyzers, but I'm totally stuck. Any ideas?

Upvotes: 4

Views: 1539

Answers (1)

Simon Steinberger
Simon Steinberger

Reputation: 6825

A working solution/workaround is to simply include a non-analyzed field in the scheme and match term suggestions on this field only. It works for us, however it should be possible without this extra data.

Upvotes: 1

Related Questions