Reputation: 435
For dummies, in PHP what is the difference between single-byte strings and multi-byte strings and in which situations should we consider one or another?
For single-byte strings (e.g. US-ASCII, ISO 8859 family, etc.) use substr and for multi-byte strings (e.g. UTF-8, UTF-16, etc.) use mb_substr:
// singlebyte strings $result = substr($myStr, 0, 5); // multibyte strings $result = mb_substr($myStr, 0, 5);
For instance, if I plan to develop something to be used in china, do I need to adopt any special measures because of their special characters ? Isnt' Utf-8 encoding good enough?
Upvotes: 3
Views: 2119
Reputation: 3188
The function strlen
(Single bytes) returned full count bytes, and function mb_strlen
returned count characters!
The char can be have a more then 1 byte (UTF-8 for example).
For you example:
$myStr = '៘៥឴ឨឆ';
$result = substr($myStr, 0, 5);
$result = mb_substr($myStr, 0, 5, mb_detect_encoding($myStr));
Function substr
in this example return invalid value, because chars have more the one byte, but function mb_substr returned correct data.
Upvotes: 3