Reputation:
BLUF: Why is the decode()
method on a bytes
object failing to decode ç
?
I am receiving a UnicodeDecodeError: 'utf-8' codec can't decode by 0xe7 in position....
. Upon tracking down the character, it is the ç
character. So when I get to reading the response from the server:
conn = http.client.HTTPConnection(host = 'something.com')
conn.request('GET', url = '/some/json')
resp = conn.getresponse()
content = resp.read().decode() # throws error
I am unable to get the content. If I just do content = resp.read()
it is successful, I can write to file using wb
but then whever the ç
is, it is replaced with 0xE7
in the file upon writing. Even if I open the file in Notepad++ and set the encoding to UTF-8, the character only shows as the hex version.
Why am I not able to decode this UTF-8 character from an HTTPResponse? Am I not correctly writing it to file either?
Upvotes: 0
Views: 401
Reputation: 695
When you have issues with encoding/decoding, you should take a look at the UTF-8 Encoding Debugging Chart.
If you look in the chart for the Windows 1252
code point 0xE7
you find the expected character is ç
showing that the encoding is CP1252
.
Upvotes: 1