Takacs Adrian
Takacs Adrian

Reputation: 3

unicode to str in python 2.7.3

I have some problems converting from unicode to str in python. To give some context:

$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "αά".decode('utf-8')
u'\u03b1\u03ac'
>>> u"αά".encode('utf-8')
'\xce\xb1\xce\xac'

Now for some stange reason i have a library function which in case of αά gives the string u'\xce\xb1\xce\xac' and i need to get the string u'\u03b1\u03ac' and everything i try does not work if I try decode gives me error

>>> u'\xce\xb1\xce\xac'.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

So i need a way to make u'xce\xb1\xce\xac' in 'xce\xb1\xce\xac' it does not work with str:

>>> str(u'\xce\xb1\xce\xac')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Any ideas on how to do it are welcome.

Upvotes: 0

Views: 715

Answers (1)

bufh
bufh

Reputation: 3410

Edited

It appear your input is double-encoded, so you should:

>>> u'\xce\xb1\xce\xac'.encode('raw_unicode_escape').decode('utf8')
u'\u03b1\u03ac'

At first I though it was an issue with your terminal encoding which did not accept to print 'αά'.decode('utf8')...

See the related post:

Sorry for my mistakes.

Upvotes: 2

Related Questions