Reputation: 405
What does it mean when the icon_strlen fails on bad character sequences specifically character sequences
is what I want to know. Thanks
Upvotes: 3
Views: 401
Reputation: 83682
A character sequence
is a series of bytes. When using UTF-8 not all combinations of bytes are valid.
The byte sequence \xc2\xbc
forms the Unicode character U+00BC
which is the VULGAR FRACTION ONE QUARTER
symbol (¼) when using UTF-8 encoding.
The byte sequence \xe2\x88\x9c
forms the Unicode character U+221C
which is the FOURTH ROOT
symbol (∜) when using UTF-8 encoding.
A bad character sequence for UTF-8 encoding would be any byte combination that doesn't fit into the required schema for UTF-8 byte streams, e.g. the byte sequence \xbc\xbc
would be illegal because two byte characters must have 110xxxxx
in the first byte but \xbc
is 10111100
written as bits.
Upvotes: 4