cyhhao
cyhhao

Reputation: 55

why I can't decode the ‘utf8’ string in python2.7?

I use python write:

'\xF5\x90\x90\x90'.decode('utf8')

But it make error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0: invalid start byte

The string\xF5\x90\x90\x90 is a standard 'utf8' string. It's binary is 11110101 10010000 10010000 10010000. Comply with the rules of utf8 :11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

why I can't decode this string ?

Upvotes: 3

Views: 300

Answers (1)

Mark Ransom
Mark Ransom

Reputation: 308111

From Wikipedia:

In November 2003, UTF-8 was restricted by RFC 3629 to end at U+10FFFF, in order to match the constraints of the UTF-16 character encoding.

The character you're trying to decode is outside of this range. Specifically it's U+150410.

Upvotes: 5

Related Questions