Reputation: 6822
I'm writing a basic function in PHP which takes an input string, converts a list of "weird" characters to URL-friendly ones. Writing the function is not the issue, but rather how it inteprets strings with weird charaters.
For example, right now I have this problem:
$string = "år";
echo $string[0]; // Output: �
echo $string[1]; // Output: �
echo $string[0] . $string[1]; // Output: å
echo $string[2]; // Output: r
So basically it interprets the letter "å" as two characters, which causes problem for me. Because I want to be able to look at each character of the string individually and replace it if needed.
I encode everything in UTF8 and I know my issue has to do something with UTF8 treating weird characters as two chars, as we've seen above.
But how do I work around this? Basically I want to achieve this:
$string = "år";
echo $string[0]; // Output: å
echo $string[1]; // Output: r
Upvotes: 1
Views: 594
Reputation: 6155
Since UTF encoding is not always 1 byte per-letter, but stretches as you need more space your non-ASCII letters actually take more than one byte of memory. And array-like access to a string variable returns that byte, not a letter. So to actually get it, you should use methods for that
echo mb_substr($string, 0,1);// Output: å
echo mb_substr($string, 1,1);// Output: r
Upvotes: 1
Reputation: 48887
$string = "år";
mb_internal_encoding('UTF-8');
echo mb_substr($string, 0, 1); // å
echo mb_substr($string, 1, 1); // r
Upvotes: 2