'utf8' codec can't decode byte 0xc3 while decode('utf-8') in python

Question

Today I was hit with strange error in my script:

'utf8' codec can't decode byte 0xc3 in position 21: invalid continuation byte

I'm reading data from socket sock.recv and result is buff.decode('utf-8') where buff is the returned data.

But today I found pretty much "unicorn" where one of the characters returned "▒" <-- this is what throw decode utf-8 into exception. Is there some pre process that would either remove or replace such a strange character?

Taku · Accepted Answer

There is a second parameter for .decode() named errors. You can set it to 'ignore' to ignore all non-utf8 characters, or set it to 'replace' to replace them with the diamond question mark (�).

buff.decode('utf-8', 'ignore')

'utf8' codec can't decode byte 0xc3 while decode('utf-8') in python

Answers (1)

Related Questions

&#39;utf8&#39; codec can&#39;t decode byte 0xc3 while decode(&#39;utf-8&#39;) in python

Answers (1)

Related Questions

'utf8' codec can't decode byte 0xc3 while decode('utf-8') in python