Reputation: 3218
Can someone explain why I can convert a bytes
object to a str
via
>>> bytes_ = b';\xf7\xb8W\xef\x0f\xf4V'
>>> list(bytes_)
[59, 247, 184, 87, 239, 15, 244, 86]
>>> "".join([chr(x) for x in bytes_])
';÷¸Wï\x0fôV'
But if I call
>>> bytes_.decode()
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
bytes_.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 1: invalid start byte
I get an error.
Upvotes: 1
Views: 590
Reputation: 57105
The default encoding used by .decode()
is UTF-8. However, at least some bytes in bytes_
do not correctly encode UTF-8 characters. On the other hand, chr(n)
returns the n'th Unicode character by its ordinal number, not by encoding. If you want .decode()
to work, you must tell it which encoding to use. For example, utf-16
seems to work:
bytes_.decode('utf-16')
#'\uf73b垸\u0fef围'
CP1252
works, too, but (expectedly) gives different results:
bytes_.decode('cp1252')
#';÷¸Wï\x0fôV'
Upvotes: 4