Zero
Zero

Reputation: 1899

Best stemming algorithm in NLTK, Python

I am trying to stem the word tokens I get after tokenizing the data using PorterStemmer but am getting incorrect results. Which stemming algorithm would be the best one to go with?

Code-

from nltk.stem import PorterStemmer

porter = PorterStemmer()
porter.stem("mobile")

Code Output-

mobil

Expected Output-

mobile

Upvotes: 0

Views: 925

Answers (1)

EliasK93
EliasK93

Reputation: 3174

You might be looking for lemmatization and not stemming. Check out https://www.guru99.com/stemming-lemmatization-python-nltk.html.

Stemming means the reduction to the root/base of the word. Lemmatization means the reduction to the non-flectional base form (e.g. infinitive for verbs).

The root of "mobile" is "mobil" because of words like "mobility". The unchanged root/base does in this case not include the e.

Upvotes: 1

Related Questions