Xiaohe Dong
Xiaohe Dong

Reputation: 5023

how to switch on the elasticsearch stemming

I don't know how to turn on the Elasticsearch English word stemming. I am sorry I didn't find out a clear example to do that.

Here is what I did

Creating the index

PUT /staff/list/ -d 
{
  "settings" : {
    "analysis": {
      "analyzer": {
        "standard": {
          "type": "standard"
        }
      }
    }
  }
}

Adding document

PUT /staff/list/jason
{
      "Title" : "searches"
}

when I search for search

GET /staff/list/_search?q=search

The result doesnt appear.

What index setting I should do to make the stemming works.

Many thanks in advance

Upvotes: 4

Views: 3674

Answers (2)

Eyal.Dahari
Eyal.Dahari

Reputation: 770

Please note that the default Elasticsearch analyzer do not support stemming.
In order to support stemming you may need to create a custom analyzer.
Here is how you do it:

Create the index and define an analyzer called my_analyzer

PUT /staff
{
  "settings" : {
    "analysis": {
      "filter": {
        "filter_snowball_en": {
          "type": "snowball",
          "language": "English"
        }
      },
      "analyzer": {
        "my_analyzer": {
            "filter": [
              "lowercase",
              "filter_snowball_en"
            ],
          "type": "custom",
          "tokenizer": "whitespace"
        }
      }
    }
  }
}

Configure mapping that assigns my_analyzer to list type

PUT /staff/_mapping/list
{
  "list": {
    "properties": {
      "title": {
        "type":     "string",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Index documents

PUT /staff/list/jason
{
   "title": "searches"
}


PUT /staff/list/debby
{
   "title": "searched open"
}

Search and stemmed results

GET staff/list/_search
{
  "query": {
    "query_string": {
      "query": "title:opened"
    }
  }
}

Result

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
      {
          "_index": "staff",
          "_type": "list",
          "_id": "debby",
          "_score": 1,
          "_source": {
              "title": "open"
          }
      }]
   }
}

As you can see in the search results, debby document which contains the term
open was returned although we where searching for opened.

Hope that helps.

Upvotes: 3

Igor Belo
Igor Belo

Reputation: 738

When you create the index, you are doing nothing (just re-declaring the standard analyzer).

The standard analyzer is the default that Elasticsearch uses, which doesn't stem any word.

You need to map the fields to their respective analyzers at your index creation (mapping documentation):

PUT /staff -d
{
    "mappings": {
        "list": {
            "properties": {
                "Title": {
                  "type": "string",
                  "analyzer": "english"
                }
            }
        }
    }
}

I guess english analyzer fits to your case (which uses the standard tokenizer).

Upvotes: 2

Related Questions