Reputation: 365
I know about stemming and lemmatizing as follows:
stemming - converts words into non-changing portions;amusing, amusement - amus
lemmatizing - converts words to dictionary form ; amusing, amusement - amuse
I can understand why to use lemmatization. But I dont get the purpose behind doing stemming ? Can you explain ?
Upvotes: 2
Views: 212
Reputation: 13401
As you said stemming - converts words into non-changing portions
and lemmatizing - converts words to dictionary form
Machine Learning algorithms like BOW or tf-idf are related to word frequency
Let's take an example you provided in your question.
with stemming
amusing, amusement
both words returns amus
so these words will be treated as same and frequency for amus
will be 2.
with lemmatization
amusing, amusement
both words returns amuse
so again these words will be treated as same and frequency for amuse
will be 2
In your model it doesn't matter(in this particular case) if you use either stemming or lemma
Stemming just stripping the letters from the word while lemmatization requires looking into dictionary to find related word so obviously is faster stemming than lemmatization
So you can choose stemming
over lemmatization
if you want to speed up preprocessing
Disadvantage
In case of stemming
studying will give study and studies will give studi
even those words have same root, these words will be treated as different
Upvotes: 2