Reputation: 45
I'm trying to run through the TextBlob tutorial in Windows (using Git Bash shell) with Python 3.3.
I've installed textblob
and nltk
as well as any dependencies.
The Python code is:
from text.blob import TextBlob
wiki = TextBlob("Python is a high-level, general-purpose programming language.")
tags = wiki.tags
I'm getting the following error
Traceback (most recent call last):
File "textblob.py", line 4, in <module>
tags = wiki.tags
File "c:\Python33\lib\site-packages\text\decorators.py", line 18, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "c:\Python33\lib\site-packages\text\blob.py", line 357, in pos_tags
for word, t in self.pos_tagger.tag(self.raw)
File "c:\Python33\lib\site-packages\text\taggers.py", line 40, in tag
return pattern_tag(sentence, tokenize)
File "c:\Python33\lib\site-packages\text\en.py", line 115, in tag
for sentence in parse(s, tokenize, True, False, False, False, encoding).split():
File "c:\Python33\lib\site-packages\text\en.py", line 99, in parse
return parser.parse(unicode(s), *args, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1213, in parse
s[i] = self.find_tags(s[i], **kwargs)
File "c:\Python33\lib\site-packages\text\en.py", line 49, in find_tags
return _Parser.find_tags(self, tokens, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1161, in find_tags
map = kwargs.get( "map", None))
File "c:\Python33\lib\site-packages\text\text.py", line 967, in find_tags
tagged.append([token, lexicon.get(token, i==0 and lexicon.get(token.lower()) or None)])
File "c:\Python33\lib\site-packages\text\text.py", line 98, in get
return self._lazy("get", *args)
File "c:\Python33\lib\site-packages\text\text.py", line 79, in _lazy
self.load()
File "c:\Python33\lib\site-packages\text\text.py", line 367, in load
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 367, in <genexpr>
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 346, in _read
for line in f:
File "c:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 16: character maps to <undefined>
Any idea what is wrong here? Adding a 'u'
before the string didn't help.
Upvotes: 3
Views: 1779
Reputation: 1714
Release 0.7.1 fixes this issue, which means it's time for a
$ pip install -U textblob
The problem was that the en-lexicon.txt
file used for part-of-speech tagging opened the file using Windows' default platform encoding, cp1252. The file apparently had characters that Python could not decode from this encoding. This was fixed by explicitly opening the file in utf-8 mode.
Upvotes: 3