Reputation: 11

English analyzer (stemming) in ElasticSearch does not work

I tried to apply a custom english analyzer, as well as the standard english analyzer in elasticsearch. My aim is especially to use stemming. So let's say I have following words in my documents: covers, impression.

Now, if I search for e.g. cover or impressive or impressions, I get 0 results. Only if I search for the exact terms "covers" or "impression" I will hit results.

This are my settings in elasticsearch (according to this documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html):

{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        }
      },
      "analyzer": {
        "rebuilt_english": {
          "tokenizer":  "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
          ]
        }
      }
    }
  }
}

My mapping looks as follows:

"mapping": {
  "_doc": {
     "properties": {
        "title": {"type": "text",
                   "analyzer": "rebuilt_english"},
        "description: {"type": text"
                       "analyzer": "rebuilt_english"}
  }
 }
}

I also tried (according to a few different tutorials) to change the settings like this (I just add the changes here, not the full code again):

{
  "settings": {
    "analysis": {
    "analyzer: "rebuilt_english" {
    "type": "custom",
     "filter": #and so on...

Do I miss something here? As far as I understand, I need to set the settings for a specific analyzer in "settings", give it a name and then use that name in "mapping" properties, so every item is analyzed according to the settings set above.

I also tried to not set any specific settings and just set the analyzer properties (in mapping) for each item like:

"title": {"type": "text",
"analyzer": "english"}

Which also doesn't work (even when using filters like stemming).

I really tried to find a solution for hours, but I can't get it to work. Help would be much appreciated. Thanks!

UPDATE

This is the code I used to create the index (my latest try, according to my description I also tried other ways to apply the method):

PUT /my_index

{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_english": {
          "type": "custom",
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        },
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
            ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": { "type": "text",
          "analyzer": "rebuilt_english"
        },
        "description": { "type": "text",
                    "analyzer": "rebuilt_english"}
                    }
        }
      }
    }
}

Upvotes: 0

Answers (3)

Akoffice

Reputation: 381

This below analyzer would work, fix is while you have defined "tokenizer":"standard" then don't define "type":"standard" field

PUT /analyzers_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "my_stemmer",
            "lowercase"
          ]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      }
    }
  }
}

Upvotes: 0

emon

Reputation: 1949

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "english_stop": {
          "type":"standard",
          "stopwords": "_english_"
          },
          "my_analyzer": {
            "type":"custom",
            "tokenizer":"standard",
            "filter":["my_stemmer"]
          }
        },
        "filter": {
          "my_stemmer":{
            "type": "stemmer",
            "language": "english"
          }
        }
    }
  }
}

POST /my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "I'm in the mood for drinking semi-dry wine!"
}

I think this will help. Thanks.

Upvotes: 0

Evaldas Buinauskas

Reputation: 14097

Your issue was that you had your filter key, where you have all your named filters was in wrong place. It was placed within analyzer, but was supposed to be a sibling key to analyzer.

So my bet is that the following config should work as expected:

{
  "settings":{
    "analysis":{
      "filter":{
        "english_stop":{
          "type":"stop",
          "stopwords":"_english"
        },
        "english_stemmer":{
          "type":"stemmer",
          "language":"english"
        },
        "english_possessive_stemmer":{
          "type":"stemmer",
          "language":"possessive_english"
        }
      },
      "analyzer":{
        "rebuilt_english":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":[
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
          ]
        }
      }
    },
    "mappings":{
      "_doc":{
        "properties":{
          "title":{
            "type":"text",
            "analyzer":"rebuilt_english"
          },
          "description":{
            "type":"text",
            "analyzer":"rebuilt_english"
          }
        }
      }
    }
  }
}

Upvotes: 0

English analyzer (stemming) in ElasticSearch does not work

Answers (3)

Related Questions