Reputation: 1899
I am trying to stem the word tokens I get after tokenizing the data using PorterStemmer but am getting incorrect results. Which stemming algorithm would be the best one to go with?
Code-
from nltk.stem import PorterStemmer
porter = PorterStemmer()
porter.stem("mobile")
Code Output-
mobil
Expected Output-
mobile
Upvotes: 0
Views: 925
Reputation: 3174
You might be looking for lemmatization and not stemming. Check out https://www.guru99.com/stemming-lemmatization-python-nltk.html.
Stemming means the reduction to the root/base of the word. Lemmatization means the reduction to the non-flectional base form (e.g. infinitive for verbs).
The root of "mobile"
is "mobil"
because of words like "mobility"
. The unchanged root/base does in this case not include the e
.
Upvotes: 1