Reputation: 287
Currently, I'm trying to look for a solution to encode url which contains unicode characters, Khmer Unicode. I've tried using php built-in function urlencode() and it gives result: For example: http://www.example.com/?kwd=Mac+Book+Pro+នៅប្រទេសយើង
While I've tested with Google search, it results: https://www.google.com.kh/#hl=en&sclient=psy-ab&q=Mac+Book+Pro+%E1%9E%93%E1%9F%85%E1%9E%94%E1%9F%92%E1%9E%9A%E1%9E%91%E1%9F%81%E1%9E%9F%E1%9E%99%E1%9E%BE%E1%9E%84&oq=Mac+Book+Pro+%E1%9E%93%E1%9F%85%E1%9E%94%E1%9F%92%E1%9E%9A%E1%9E%91%E1%9F%81%E1%9E%9F%E1%9E%99%E1%9E%BE%E1%9E%84
How to do that? Hope someone here would help me. Thanks in advance!
Upvotes: 10
Views: 22128
Reputation: 574
For UTF-8 you can use:
urlencode($string); //for encoding
so you will get exactly same encoding as in Google Search
Note that in PHP, you don't have to decode request parameters manually, they are decoded by PHP. But in case you have an encoded string, do
urldecode($string);
For UTF-16 you can use this function (from notes for urlencode
in http://php.net/urlencode):
function utf16_urlencode ( $str ) {
# convert characters > 255 into HTML entities
$convmap = array( 0xFF, 0x2FFFF, 0, 0xFFFF );
$str = mb_encode_numericentity( $str, $convmap, "UTF-8");
# escape HTML entities, so they are not urlencoded
$str = preg_replace( '/&#([0-9a-fA-F]{2,5});/i', 'mark\\1mark', $str );
$str = urlencode($str);
# now convert escaped entities into unicode url syntax
$str = preg_replace( '/mark([0-9a-fA-F]{2,5})mark/i', '%u\\1', $str );
return $str;
}
Upvotes: 14
Reputation: 2843
function cleanUrl($url) {
$res= urlencode(utf8_encode($url));
$res = str_replace("%3A",":",$res);
$res = str_replace("%2F","/",$res);
return $res;
}
Upvotes: 2
Reputation: 4974
Try rawurlencode
http://php.net/manual/en/function.rawurlencode.php
Upvotes: 1