GHOV
GHOV

Reputation: 405

PHP iconv_strlen question

What does it mean when the icon_strlen fails on bad character sequences specifically character sequences is what I want to know. Thanks

Upvotes: 3

Views: 401

Answers (1)

Stefan Gehrig
Stefan Gehrig

Reputation: 83682

A character sequence is a series of bytes. When using UTF-8 not all combinations of bytes are valid.

The byte sequence \xc2\xbc forms the Unicode character U+00BC which is the VULGAR FRACTION ONE QUARTER symbol (¼) when using UTF-8 encoding.

The byte sequence \xe2\x88\x9c forms the Unicode character U+221C which is the FOURTH ROOT symbol (∜) when using UTF-8 encoding.

A bad character sequence for UTF-8 encoding would be any byte combination that doesn't fit into the required schema for UTF-8 byte streams, e.g. the byte sequence \xbc\xbc would be illegal because two byte characters must have 110xxxxx in the first byte but \xbc is 10111100 written as bits.

Upvotes: 4

Related Questions