Reputation: 43
I am extracting data from a Google spreadsheet using Spreadsheet API in Python. I can print every row of my spreadsheet on the commandline with a for loop but some of the text contain symbols e.g. celsius degree symbol(little circle). As I print these rows on the commandline I want to write them to a file. But I get different unicode errors when I do this. I tried solving it by doing it manually but there are too many:
current=current.replace(u'\xa0',u'')
current=current.replace(u'\u000a',u'p')
current=current.replace(u'\u201c',u'\"')
current=current.replace(u'\u201d',u'\"')
current=current.replace(u'\u2014',u'-')
what can I do so I won't get errors? e.g.
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 1394: ordinal not in range(128)
current=current.replace(u'\u0446',u'u')
Upvotes: 2
Views: 3966
Reputation: 2254
import unicodedata
decoded = unicodedata.normalize('NFKD', encoded).decode('UTF-8', 'ignore')
I'm not quite sure that the normalize is needed in this case. Also, that ignore option means that you might loose some information, because decoding errors will be ignored.
Upvotes: 0
Reputation: 165242
You want to decode it from whatever encoding it's in:
decoded_str = encoded_str.decode('utf-8')
For more information on how to deal with unicode strings, you should go over http://docs.python.org/howto/unicode.html
Upvotes: 5