Reputation: 87
Here is what I'd like the stemmer to do: breaking: break broke: break broken: break entering: enter entered: enter enter: enter
I've indexed the field as follows: "body": { "type": "text", "fields": { "stemmed": { "type": "text", "analyzer": "english" } } }
When I query “breaking and entering”, I can see that what is searched for in the body.stemmed field is: "break and enter". Seems good. However, when I query “broke and entered”, I get: “broke and enter”. Thus, apparently, “broke” does not become “break” when the "english" stemmer is used. Likewise, “broken and entered” becomes: “broken and enter”. So, ES apparently does not change either “broke” or “broken” to “break” (which, according to this: snowball, I guess explains why if this is what is used).
So, is there a way to specify a "known" stemmer that will accomplish what I'm trying to do?
Upvotes: 0
Views: 765
Reputation: 1547
Your requirement can be fulfilled by a Dictionary Stemmer, which does dictionary lookups for stemming words. Algorithmic stemmers stem without knowledge about the root words, they simply do it algorithmically. Look at Hunspell stemmer, think it will do the job: https://www.elastic.co/guide/en/elasticsearch/guide/current/hunspell.html
Upvotes: 2