minks
minks

Reputation: 3029

How do I fix this UnicodeDecodeError?

The stemmer as well as lemmatizer seem to produce this error for certain sentences passed to my textfile. What do they mean and how do I solve them?

 Traceback (most recent call last):
      File "preproc.py", line 89, in <module>
        apos=stem_data(nostop)
      File "preproc.py", line 51, in stem_data
        r=stemmer.stem(n)
      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 632, in stem
        stem = self.stem_word(word.lower(), 0, len(word) - 1)
      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 590, in stem_word
        word = self._step1ab(word)
      File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 275, in _step1ab
        if word.endswith("sses"):
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)

Upvotes: 0

Views: 351

Answers (1)

ubadub
ubadub

Reputation: 3880

You have some kind of non-ascii character, so it's an encoding issue.. It would help to know which sentences are producing this error

Upvotes: 1

Related Questions