Gijs P
Gijs P

Reputation: 1470

UTF-8 Encoding with internet explorer %u20AC to €

I'm currently using TinyMCE as html editor for users of my CMS. Somehow the euro symbol (€) is converted to %u20AC by IE (any).

After a short search I found this. It gives a lot for different encodings for the UTF-8 euro symbol, but not %u20AC, with the percentage icon.

I have given the proper headers for UTF-8, so I gues IE is just being rude doing things its own way...

Is there a PHP function that can catch this strange encoding and put it to normal htmlentity (hex,decimal or named). I could just string_replace() this single problem symbol, but I'd rather fix all possible conflicts at once.

Or should I simply replace %u with &#x disabling normal usage of %u?

Upvotes: 3

Views: 8699

Answers (2)

hakre
hakre

Reputation: 197659

%u20AC is Unicode-encoded data for which is generated by JavaScript escape() function MDN, ECMA262 to UTF8 for server-side processing.

Standard PHP urldecode() can not deal with it (it is a non-standard percent encoding WP), so you need to use an extended routine:

/**
 * @param string $string unicode and ulrencoded string
 * @return string decoded string
 */
function utf8_urldecode($string) {
    $string = preg_replace(
        "/%u([0-9a-f]{3,4})/i",
        "&#x\\1;",
        urldecode($string)
    );
    return html_entity_decode($string, ENT_XML1, 'UTF-8');
}

Also check if you can configure this behaviour for your TinyMCE.


References

Upvotes: 5

Chlebta
Chlebta

Reputation: 3110

20AC it's the HEX code of euro, so you can slove this problem easly just in your html file in stead of usign try to use this code €

Upvotes: 0

Related Questions