Reputation: 55
I use python write:
'\xF5\x90\x90\x90'.decode('utf8')
But it make error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0: invalid start byte
The string\xF5\x90\x90\x90
is a standard 'utf8' string.
It's binary is 11110101 10010000 10010000 10010000
.
Comply with the rules of utf8 :11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
why I can't decode this string ?
Upvotes: 3
Views: 300
Reputation: 308111
From Wikipedia:
In November 2003, UTF-8 was restricted by RFC 3629 to end at U+10FFFF, in order to match the constraints of the UTF-16 character encoding.
The character you're trying to decode is outside of this range. Specifically it's U+150410.
Upvotes: 5