VladoPortos
VladoPortos

Reputation: 603

'utf8' codec can't decode byte 0xc3 while decode('utf-8') in python

Today I was hit with strange error in my script:

'utf8' codec can't decode byte 0xc3 in position 21: invalid continuation byte

I'm reading data from socket sock.recv and result is buff.decode('utf-8') where buff is the returned data.

But today I found pretty much "unicorn" where one of the characters returned "▒" <-- this is what throw decode utf-8 into exception. Is there some pre process that would either remove or replace such a strange character?

Upvotes: 4

Views: 2505

Answers (1)

Taku
Taku

Reputation: 33714

There is a second parameter for .decode() named errors. You can set it to 'ignore' to ignore all non-utf8 characters, or set it to 'replace' to replace them with the diamond question mark (�).

buff.decode('utf-8', 'ignore')

Upvotes: 5

Related Questions