user2682863
user2682863

Reputation: 3218

difference between chr() and bytes.decode

Can someone explain why I can convert a bytes object to a str via

>>> bytes_ = b';\xf7\xb8W\xef\x0f\xf4V'
>>> list(bytes_)
[59, 247, 184, 87, 239, 15, 244, 86]
>>> "".join([chr(x) for x in bytes_])
';÷¸Wï\x0fôV'

But if I call

>>> bytes_.decode()
Traceback (most recent call last):
  File "<pyshell#17>", line 1, in <module>
    bytes_.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 1: invalid start byte

I get an error.

Upvotes: 1

Views: 590

Answers (1)

DYZ
DYZ

Reputation: 57105

The default encoding used by .decode() is UTF-8. However, at least some bytes in bytes_ do not correctly encode UTF-8 characters. On the other hand, chr(n) returns the n'th Unicode character by its ordinal number, not by encoding. If you want .decode() to work, you must tell it which encoding to use. For example, utf-16 seems to work:

bytes_.decode('utf-16')
#'\uf73b垸\u0fef围'

CP1252 works, too, but (expectedly) gives different results:

bytes_.decode('cp1252')
#';÷¸Wï\x0fôV'

Upvotes: 4

Related Questions