How to validate that a string is a valid UTF-8 string in python 2.7

Question

I have the following string -

"\xed\xad\x80\xed\xb1\x93"

When using this string to execute queries in the PostgreSQL DB, it raises the following error -

DataError: invalid byte sequence for encoding "UTF8": 0xed 0xad 0x80

When testing it in python 2.7 (before executing the query) it doesn't raise an exception -

Windows test -

'\xed\xad\x80\xed\xb1\x93'.decode("utf-8")
u'\U000e0053'

Linux test -

'\xed\xad\x80\xed\xb1\x93'.decode("utf-8")
u'\udb40\udc53'

In python3, it actually raises an exception -

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte

How can I check in python 2.7 that it's not a valid utf-8 string?

Answers (1)