scrapy:UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1: ordinal not in range(128)

When I use the accent parameter in the Scrapy Framework, my spider crashes with this error:

Traceback (most recent call last):  
  File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 577, in _ runCallbacks
    current.result = callback(current.result, *args, **kw)  
  File "C:\Users\DoricTsappi\Downloads\newproject\myscrap\myscrap\spiders\scrap2.py", line 44, in parse
    occdoc = text.lower().count(key.lower())  
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1: ordinal not in range(128)

Here is my code:

# -*- coding: utf-8 -*-

Upvotes: 0

Views: 1912

Answers (1)

HelloWorld
HelloWorld

Reputation: 1863

As Andrea already mentioned this is the outcome when you mix unicode and str objects. The documentation of Scrapy (http://doc.scrapy.org/en/latest/topics/selectors.html) says that methods like .css() return unicode objects so text is of type unicode and key must be of type str.

0xe9 is likely to be the encoded character é in native Windows text encoding. So to fix this just convert your key to Unicode:

# iso-8859-1 better known as latin-1 to convert your 'key' to unicode
keys = self.mot.strip().split(self.sep)
for key in map(lambda x: unicode(x, encoding="iso-8859-1"), keys):

Here are two links which might help you two identify the issues on your own:

http://www.joelonsoftware.com/articles/Unicode.html

https://pythonhosted.org/kitchen/unicode-frustrations.html

FYI: This fix just works for Python 2.x.

Upvotes: 1

Related Questions