UnicodeDecodeError on text comparison

Question

While performing a substring match, I get UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

Code:

for bhk in bed_bath:
            if "Bedroom" in bhk.text or "Chambre à coucher" in bhk.text or "Slaapkamer" in bhk.text:
                bhk_count += 1

How do I resolve it?

I have included below lines on the beginning of my file.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

memoselyk · Accepted Answer

I'm assuming you are using python 2.

The problem is happening because bhk.text is a unicode string.

When you do a comparison like "Chambre à coucher" in bhk.text the literal string, which is an non-unicode strings needs to be converted to a unicode string.

Since you declared your file to have a utf-8 encoding, the unicode char à is encoded as string "\xc3\xa0".

When python tries to convert char "0xc3" using the default codec (ascii), it cannot map it to a unicode char and throws that error.

The solution would be to declare the strings with non-ascii characters as unicode, like:

u"Chambre à coucher" in bhk.text

UnicodeDecodeError on text comparison

Answers (1)

Related Questions