Tyler Durden
Tyler Durden

Reputation: 43

Python saving string to file. Unicode error

I am extracting data from a Google spreadsheet using Spreadsheet API in Python. I can print every row of my spreadsheet on the commandline with a for loop but some of the text contain symbols e.g. celsius degree symbol(little circle). As I print these rows on the commandline I want to write them to a file. But I get different unicode errors when I do this. I tried solving it by doing it manually but there are too many:

current=current.replace(u'\xa0',u'')
current=current.replace(u'\u000a',u'p')
current=current.replace(u'\u201c',u'\"')
current=current.replace(u'\u201d',u'\"')
current=current.replace(u'\u2014',u'-')

what can I do so I won't get errors? e.g.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 1394: ordinal not in range(128)

current=current.replace(u'\u0446',u'u')

Upvotes: 2

Views: 3966

Answers (3)

Blubber
Blubber

Reputation: 2254

import unicodedata
decoded = unicodedata.normalize('NFKD', encoded).decode('UTF-8', 'ignore')

I'm not quite sure that the normalize is needed in this case. Also, that ignore option means that you might loose some information, because decoding errors will be ignored.

Upvotes: 0

Yuval Adam
Yuval Adam

Reputation: 165242

You want to decode it from whatever encoding it's in:

decoded_str = encoded_str.decode('utf-8')

For more information on how to deal with unicode strings, you should go over http://docs.python.org/howto/unicode.html

Upvotes: 5

Niklas R
Niklas R

Reputation: 16870

''.join(c for c in current if ord(c) < 128)

Upvotes: -1

Related Questions