PHP charset issue

Question

I'm writing a basic function in PHP which takes an input string, converts a list of "weird" characters to URL-friendly ones. Writing the function is not the issue, but rather how it inteprets strings with weird charaters.

For example, right now I have this problem:

$string = "år";
echo $string[0]; // Output: �
echo $string[1]; // Output: �
echo $string[0] . $string[1]; // Output: å
echo $string[2]; // Output: r

So basically it interprets the letter "å" as two characters, which causes problem for me. Because I want to be able to look at each character of the string individually and replace it if needed.

I encode everything in UTF8 and I know my issue has to do something with UTF8 treating weird characters as two chars, as we've seen above.

But how do I work around this? Basically I want to achieve this:

$string = "år";
echo $string[0]; // Output: å
echo $string[1]; // Output: r

Artjom Kurapov · Accepted Answer

Since UTF encoding is not always 1 byte per-letter, but stretches as you need more space your non-ASCII letters actually take more than one byte of memory. And array-like access to a string variable returns that byte, not a letter. So to actually get it, you should use methods for that

echo mb_substr($string, 0,1);// Output: å
echo mb_substr($string, 1,1);// Output: r

PHP charset issue

Answers (2)

Related Questions