Reputation: 1049
In Py2:
(chr(145) + chr(78)).decode('utf-16')
I got u'\u4e91'
:
But in Py3:
(chr(145) + chr(78)).encode('utf-8').decode('utf-16')
I got an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x4e in position 2: truncated data
Sometimes, they work in a same way, such as (chr(93) + chr(78))
, but sometimes not.
Why? And how can I do this right in Py3?
Upvotes: 1
Views: 156
Reputation: 24232
You have to use latin1
if you want to encode any byte tranparently:
(chr(145) + chr(78)).encode('latin1').decode('utf-16')
#'云'
chr(145)
gets encoded with 2 bytes in utf8 (as with all values above 127):
chr(145).encode('utf8')
# b'\xc2\x91'
while it is what you wanted with latin1:
chr(145).encode('latin1')
# b'\x91'
Upvotes: 2