Is there a quicker snowball stemmer in python 3.6 than NLTK's?

Question

I am currently using NLTK's SnowballStemmer to stem the words in my documents and this was working fine when I had 68 documents. Now I have 4000 documents and this is way too slow. I read another post where someone suggested to use PyStemmer, but this is not offered on Python 3.6 Are there any other packages that would do the trick? Or maybe there's something I can do in the code to speed up the process.

Code:

eng_stemmer = nltk.stem.SnowballStemmer('english')
...
class StemmedCountVectorizer(CountVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedCountVectorizer, self).build_analyzer()
        return lambda doc: ([eng_stemmer.stem(w) for w in analyzer(doc)])

user9306022 · Accepted Answer

PyStemmer does not say that it works with python 3.6 in its documentation but it actually does. Install the proper Visual Studio C++ Build compatible with python 3.6 which you can find here: http://landinghub.visualstudio.com/visual-cpp-build-tools

And then try pip install pystemmer

If that doesn't work then make sure you install manually exactly as it says here: https://github.com/snowballstem/pystemmer

Is there a quicker snowball stemmer in python 3.6 than NLTK's?

Answers (1)

Related Questions

Is there a quicker snowball stemmer in python 3.6 than NLTK&#39;s?

Answers (1)

Related Questions

Is there a quicker snowball stemmer in python 3.6 than NLTK's?