Reputation: 75
I use Python 2.6.6 My locale is ('en_US', 'UTF8')
I try many ways to convert utf-8 string to big5, but it can't work. If you know how to do that, please give me some advice, thanks a lot.
A chinese word called '單車', it mean 'bicycle'
It's unicode is \u55ae\u8eca
str_a = u'\u55ae\u8eca'
str_b = '\u55ae\u8eca'
print str_a # output '單車'
print str_b # output '\u55ae\u8eca'
I know the str_a can be work, but I want to convert str_b to big5, too.
I try out decode, encode, unicode, but it still can't work.
Have any good idea? Thanks.
Upvotes: 1
Views: 12234
Reputation: 879481
str_b
is a sequence of bytes:
In [19]: list(str_b)
Out[19]: ['\\', 'u', '5', '5', 'a', 'e', '\\', 'u', '8', 'e', 'c', 'a']
The backslash and u
and so forth all are just separate characters. Compare that to sequence of unicode code points in the unicode object str_a
:
In [24]: list(str_a)
Out[24]: [u'\u55ae', u'\u8eca']
To convert the mal-formed string str_b
to unicode decode with unicode-escape
:
In [20]: str_b.decode('unicode-escape')
Out[20]: u'\u55ae\u8eca'
In [21]: print(str_b.decode('unicode-escape'))
單車
Upvotes: 5
Reputation: 80031
You should be able to do this:
str_a = u'\u55ae\u8eca'
str_b = str_a.encode('big5')
print str_a
print str_b.decode('big5')
Upvotes: 3