Reputation: 39
How can I get b'\xe3\x81\x82'
from '\xe3\x81\x82'
?
Finally, I want u'\u3042'
, which means Japanese letter 'あ',
b'\xe3\x81\x82'.decode('utf-8')
makes u'\u3042'
but
'\xe3\x81\x82'.decode('utf-8')
causes the following error
AttributeError: 'str' object has no attribute 'decode'
because b'\xe3\x81\x82'
is bytes and '\xe3\x81\x82'
is str.
I have DB with data like '\xe3\x81\x82'
.
Upvotes: 3
Views: 1377
Reputation: 1121266
If you have bytes disguising as Unicode code points, encode to Latin-1:
'\xe3\x81\x82'.encode('latin1').decode('utf-8')
Latin-1 (ISO-8859-1) maps Unicode codepoints one-on-one to bytes:
>>> '\xe3\x81\x82'.encode('latin1').decode('utf-8')
'あ'
Upvotes: 4