Reputation: 3
I have some problems converting from unicode to str in python. To give some context:
$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "αά".decode('utf-8')
u'\u03b1\u03ac'
>>> u"αά".encode('utf-8')
'\xce\xb1\xce\xac'
Now for some stange reason i have a library function which in case of αά gives the string u'\xce\xb1\xce\xac' and i need to get the string u'\u03b1\u03ac' and everything i try does not work if I try decode gives me error
>>> u'\xce\xb1\xce\xac'.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
So i need a way to make u'xce\xb1\xce\xac' in 'xce\xb1\xce\xac' it does not work with str:
>>> str(u'\xce\xb1\xce\xac')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Any ideas on how to do it are welcome.
Upvotes: 0
Views: 715
Reputation: 3410
It appear your input is double-encoded, so you should:
>>> u'\xce\xb1\xce\xac'.encode('raw_unicode_escape').decode('utf8')
u'\u03b1\u03ac'
At first I though it was an issue with your terminal encoding which did not accept to print 'αά'.decode('utf8')
...
See the related post:
Sorry for my mistakes.
Upvotes: 2