pingze
pingze

Reputation: 1049

Why can't I decode('utf-16') success in Python3 (even work in Py2)?

In Py2:

(chr(145) + chr(78)).decode('utf-16')

I got u'\u4e91':

But in Py3:

(chr(145) + chr(78)).encode('utf-8').decode('utf-16')

I got an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x4e in position 2: truncated data

Sometimes, they work in a same way, such as (chr(93) + chr(78)), but sometimes not.

Why? And how can I do this right in Py3?

Upvotes: 1

Views: 156

Answers (1)

Thierry Lathuille
Thierry Lathuille

Reputation: 24232

You have to use latin1 if you want to encode any byte tranparently:

(chr(145) + chr(78)).encode('latin1').decode('utf-16')

#'云'

chr(145) gets encoded with 2 bytes in utf8 (as with all values above 127):

chr(145).encode('utf8')
# b'\xc2\x91'

while it is what you wanted with latin1:

chr(145).encode('latin1')
# b'\x91'

Upvotes: 2

Related Questions