Reputation: 9439
I'm passing a pound symbol £
to a PHP page which has been URLEncoded by ASP as %C2%A3
.
The problem:
urldecode("%C2%A3") // £
ord(urldecode("%C2%A3")) // get the character number - 194
ord("£") // 163 - somethings gone wrong, they should match
This means when I do utf8_encode(urldecode("%C2%A3"))
I get £
However doing utf8_encode("£")
I get £
as expected
How can I solve this?
Upvotes: 4
Views: 12199
Reputation: 3122
The first comment on php.net for urlencode() explains why this is and suggests this code for correcting it:
<?php
function to_utf8( $string ) {
// From http://w3.org/International/questions/qa-forms-utf-8.html
if ( preg_match('%^(?:
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$%xs', $string) ) {
return $string;
} else {
return iconv( 'CP1252', 'UTF-8', $string);
}
}
?>
Also you should decide wether you want your final html you send to the browser to be in utf-8 or some other encoding, otherwise you will continue having £ characters in your code.
Upvotes: -1
Reputation: 8459
Some infos about urldecode and UTF-8 can be found in the first comment of the urldecode documentation. It seems to be a known problem.
Upvotes: 2
Reputation: 15735
I don't think ord()
is multibyte compatible. It's probably returning only the code for the first character in the string, which is Â. Try to utf8_decode()
the string before calling ord()
on it and see if that helps.
ord(utf8_decode(urldecode("%C2%A3"))); // This returns 163
Upvotes: 3
Reputation: 8509
if you try
var_dump(urldecode("%C2%A3"));
you'll see
string(2) "£"
because this is 2-byte character and ord() returns value of first one (194 = Â)
Upvotes: 4