Reputation: 3
When I use the accent parameter in the Scrapy Framework, my spider crashes with this error:
Traceback (most recent call last):
File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 577, in _ runCallbacks
current.result = callback(current.result, *args, **kw)
File "C:\Users\DoricTsappi\Downloads\newproject\myscrap\myscrap\spiders\scrap2.py", line 44, in parse
occdoc = text.lower().count(key.lower())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1: ordinal not in range(128)
Here is my code:
# -*- coding: utf-8 -*-
Upvotes: 0
Views: 1912
Reputation: 1863
As Andrea already mentioned this is the outcome when you mix unicode
and str
objects. The documentation of Scrapy (http://doc.scrapy.org/en/latest/topics/selectors.html) says that methods like .css()
return unicode objects so text
is of type unicode
and key
must be of type str
.
0xe9
is likely to be the encoded character é
in native Windows text encoding. So to fix this just convert your key to Unicode:
# iso-8859-1 better known as latin-1 to convert your 'key' to unicode
keys = self.mot.strip().split(self.sep)
for key in map(lambda x: unicode(x, encoding="iso-8859-1"), keys):
Here are two links which might help you two identify the issues on your own:
http://www.joelonsoftware.com/articles/Unicode.html
https://pythonhosted.org/kitchen/unicode-frustrations.html
FYI: This fix just works for Python 2.x.
Upvotes: 1