Re-using inbuilt language filters?

Question

I saw the question here, which shows how one can create a custom analyzer to have both synonym support and support for languages.

However, it seems to create its own stemmer and stopwords collection as well.

What if I want to add synonyms to the "danish" inbuilt analyzer? Can I refer to the inbuilt Danish stemmer and stopwords filter? As an example, is it just called danish_stemmer and danish_stopwords?

Perhaps a list of inbuilt filters would help - where can I see the names of these inbuilt filters?

Nikolay Vasiliev · Accepted Answer

For each pre-built language analyzer there is an example of how to rebuild it. For danish there is this example:

PUT /danish_example
{
  "settings": {
    "analysis": {
      "filter": {
        "danish_stop": {
          "type":       "stop",
          "stopwords":  "_danish_" 
        },
        "danish_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["eksempel"] 
        },
        "danish_stemmer": {
          "type":       "stemmer",
          "language":   "danish"
        }
      },
      "analyzer": {
        "rebuilt_danish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "danish_stop",
            "danish_keywords",
            "danish_stemmer"
          ]
        }
      }
    }
  }
}

This is essentially building your own custom analyzer.

The list of available stemmers can be found here. The list of available pre-built stopwords lists can be found here.

Hope that helps!

Re-using inbuilt language filters?

Answers (1)

Related Questions