Petschko
Petschko

Reputation: 178

PHP UTF-8 mb_convert_encode and Internet-Explorer

Since some days I read about Character-Encoding, I want to make all my Pages with UTF-8 for Compability. But I get stuck when I try to convert User-Input to UTF-8, this works on all Browsers, expect Internet-Explorer (like always).

I don't know whats wrong with my code, it seems fine to me.

This happens only, if you try to access to the page via $_GET on the internet-Explorer myscript.php?c=äüöß When I write down specialchars on my site, they would displayed correct.

This is my Code:

// User Input
$_GET['c'] = "äüöß"; // Access URL ?c=äüöß
//--------
header("Content-Type: text/html; charset=utf-8");
mb_internal_encoding('UTF-8');

$_GET = userToUtf8($_GET);

function userToUtf8($string) {
    if(is_array($string)) {
        $tmp = array();
        foreach($string as $key => $value) {
            $tmp[$key] = userToUtf8($value);
        }
        return $tmp;
    }

    return userDataUtf8($string);
}

function userDataUtf8($string) {
    print("1: " . mb_detect_encoding($string) . "<br>"); // Shows: 1: UTF-8
    $string = mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string)); // Convert non UTF-8 String to UTF-8
    print("2: " . mb_detect_encoding($string) . "<br>"); // Shows: 2: ASCII
    $string = preg_replace('/[\xF0-\xF7].../s', '', $string);
    print("3: " . mb_detect_encoding($string) . "<br>"); // Shows: 3: ASCII

    return $string;
}
echo $_GET['c']; // Shows nothing
echo mb_detect_encoding($_GET['c']); // ASCII
echo "äöü+#"; // Shows "äöü+#"

The most confusing Part is, that it shows me, that's converted from UTF-8 to ASCII... Can someone tell me why it doesn't show me the specialchars correctly, whats wrong here? Or is this a Bug on the Internet-Explorer?

Edit: If I disable converting it says, it's all UTF-8 but the Characters won't show to me either... They are displayed like "????"....

Note: This happens ONLY in the Internet-Explorer!

Upvotes: 1

Views: 1687

Answers (2)

frz3993
frz3993

Reputation: 1635

Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.

$_GET['c'] = utf8_encode($_GET['c']);

Upvotes: 2

Answers_Seeker
Answers_Seeker

Reputation: 468

An approach to display the characters using IE 11.0.18 which worked:

  • Retrieve the Unicode of your character : example for 'ü' = 'U+00FC'

  • According to this post, convert it to utf8 entity

  • Decode it using utf8_decode before dumping

The line of code illustrating the example with the 'ü' character is :

var_dump(utf8_decode(html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", 'U+00FC'), ENT_NOQUOTES, 'UTF-8')));

To summarize: For displaying purposes, go from Unicode to UTF8 then decode it before displaying it.

Other resources: a post to retrieve characters' unicode

Upvotes: 1

Related Questions