otteheng
otteheng

Reputation: 604

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

I'm working on scraping Oregon Teacher License data for a project I'm doing. Here's my code:

educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
print educ_employ
#[u'Jefferson Middle School\xa0\xa0(2013 - 2014)']

I want to strip the the "\xa0". This is my code:

educ_employ = ([s.strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

I tried this:

educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

And this:

import sys

reload(sys)
sys.setdefaultencoding('utf-8')

educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
>>>

I didn't get an error with the last one but I also didn't get an output. I'm using Python 2.7. Does anyone know how to fix this?

Upvotes: 1

Views: 4240

Answers (1)

Robᵩ
Robᵩ

Reputation: 168796

You are mixing up unicode objects and str objects. educ_employ is a unicode, but '\xa0' is a str.

Additionally, .strip() only removes characters from the beginning and end of the string, not the middle. Try .replace() instead.

Try:

educ_employ = [u'Jefferson Middle School\xa0\xa0(2013 - 2014)']
educ_employ = [s.replace(u'\xa0', u'') for s in educ_employ]
print educ_employ

Upvotes: 3

Related Questions