Reputation: 3029
The stemmer as well as lemmatizer seem to produce this error for certain sentences passed to my textfile. What do they mean and how do I solve them?
Traceback (most recent call last):
File "preproc.py", line 89, in <module>
apos=stem_data(nostop)
File "preproc.py", line 51, in stem_data
r=stemmer.stem(n)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 632, in stem
stem = self.stem_word(word.lower(), 0, len(word) - 1)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 590, in stem_word
word = self._step1ab(word)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 275, in _step1ab
if word.endswith("sses"):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)
Upvotes: 0
Views: 351
Reputation: 3880
You have some kind of non-ascii character, so it's an encoding issue.. It would help to know which sentences are producing this error
Upvotes: 1