Reputation: 117
I am trying to pass hex-encoded parameters to an image-creating script. All documents are in utf8. Everything is fine until I go through the string in a loop. See the minimized example:
$string="ABCDЯ";
for($i=0;$i<strlen($string);$i++) {
echo $string[$i]."<br>"
}
gives the output:
A
B
C
D
�
instead of
A
B
C
D
Я
Why is that? Since I want to analyze the characters in the string, it fails at this point, because all Russian characters end up as �.
Upvotes: 1
Views: 449
Reputation: 3620
In manual:
The string in PHP is implemented as an array of bytes and an integer indicating the length of the buffer. It has no information about how those bytes translate to characters, leaving that task to the programmer.
So, you're iterating $string
byte by byte. If a character is not encoded with single-byte, the correct result won't be returned.
Given that PHP does not dictate a specific encoding for strings, one might wonder how string literals are encoded. For instance, is the string "á" equivalent to "\xE1" (ISO-8859-1), "\xC3\xA1" (UTF-8, C form), "\x61\xCC\x81" (UTF-8, D form) or any other possible representation? The answer is that string will be encoded in whatever fashion it is encoded in the script file.
You can use mb_substr to get a character when iterating $string
.
<?php
$string = 'ABCDЯ';
for($i = 0; $i < strlen($string); $i++) {
echo mb_substr($string, $i, 1, 'UTF-8') . '<br>';
}
Upvotes: 1