UnicodeDecodeError (UTF-8) for JSON

Question

BLUF: Why is the decode() method on a bytes object failing to decode ç?

I am receiving a UnicodeDecodeError: 'utf-8' codec can't decode by 0xe7 in position..... Upon tracking down the character, it is the ç character. So when I get to reading the response from the server:

conn = http.client.HTTPConnection(host = 'something.com')
conn.request('GET', url = '/some/json')
resp = conn.getresponse()
content = resp.read().decode() # throws error

I am unable to get the content. If I just do content = resp.read() it is successful, I can write to file using wb but then whever the ç is, it is replaced with 0xE7 in the file upon writing. Even if I open the file in Notepad++ and set the encoding to UTF-8, the character only shows as the hex version.

Why am I not able to decode this UTF-8 character from an HTTPResponse? Am I not correctly writing it to file either?

Brian M. Sheldon · Accepted Answer

When you have issues with encoding/decoding, you should take a look at the UTF-8 Encoding Debugging Chart.

If you look in the chart for the Windows 1252 code point 0xE7 you find the expected character is ç showing that the encoding is CP1252.

UnicodeDecodeError (UTF-8) for JSON

Answers (1)

Related Questions