Reputation: 604
I'm working on scraping Oregon Teacher License data for a project I'm doing. Here's my code:
educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
print educ_employ
#[u'Jefferson Middle School\xa0\xa0(2013 - 2014)']
I want to strip the the "\xa0". This is my code:
educ_employ = ([s.strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
I tried this:
educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
And this:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
>>>
I didn't get an error with the last one but I also didn't get an output. I'm using Python 2.7. Does anyone know how to fix this?
Upvotes: 1
Views: 4240
Reputation: 168796
You are mixing up unicode
objects and str
objects. educ_employ
is a unicode
, but '\xa0'
is a str
.
Additionally, .strip()
only removes characters from the beginning and end of the string, not the middle. Try .replace()
instead.
Try:
educ_employ = [u'Jefferson Middle School\xa0\xa0(2013 - 2014)']
educ_employ = [s.replace(u'\xa0', u'') for s in educ_employ]
print educ_employ
Upvotes: 3