Hiding
Hiding

Reputation: 278

Is there a quicker snowball stemmer in python 3.6 than NLTK's?

I am currently using NLTK's SnowballStemmer to stem the words in my documents and this was working fine when I had 68 documents. Now I have 4000 documents and this is way too slow. I read another post where someone suggested to use PyStemmer, but this is not offered on Python 3.6 Are there any other packages that would do the trick? Or maybe there's something I can do in the code to speed up the process.

Code:

eng_stemmer = nltk.stem.SnowballStemmer('english')
...
class StemmedCountVectorizer(CountVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedCountVectorizer, self).build_analyzer()
        return lambda doc: ([eng_stemmer.stem(w) for w in analyzer(doc)])

Upvotes: 1

Views: 1643

Answers (1)

user9306022
user9306022

Reputation:

PyStemmer does not say that it works with python 3.6 in its documentation but it actually does. Install the proper Visual Studio C++ Build compatible with python 3.6 which you can find here: http://landinghub.visualstudio.com/visual-cpp-build-tools

And then try pip install pystemmer

If that doesn't work then make sure you install manually exactly as it says here: https://github.com/snowballstem/pystemmer

Upvotes: 1

Related Questions