Reputation:
I'm writing a PHP application that accepts an URL from the user, and then processes it with by making some calls to binaries with system()
*. However, to avoid many complications that arise with this, I'm trying to convert the URL, which may contain Unicode characters, into ASCII characters.
Let's say I have the following URL:
https://täst.de:8118/news/zh-cn/新闻动态/2015/
Here two parts need to be dealt with: the hostname and the path.
idn_to_ascii()
.urlencode()
over the path, as each of the characters that need to remain unmodified will also be converted (e.g. news/zh-cn/新闻动态/2015/ -> news%2Fzh-cn%2F%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81%2F2015%2F
as opposed to news/zh-cn/%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81/2015/
).How should I approach this problem?
*I'd rather not deal with system()
calls and the resulting complexity, but given that the functionality is only available by calling binaries, I unfortunately have no choice.
Upvotes: 3
Views: 823
Reputation:
The following can be used for this transformation:
function convertpath ($path) {
$path1 = '';
$len = strlen ($path);
for ($i = 0; $i < $len; $i++) {
if (preg_match ('/^[A-Za-z0-9\/?=+%_.~-]$/', $path[$i])) {
$path1 .= $path[$i];
}
else {
$path1 .= urlencode ($path[$i]);
}
}
return $path1;
}
Upvotes: 0
Reputation: 1802
You could use PHP's iconv function:
inconv("UTF-8", "ASCII//TRANSLIT", $url);
Upvotes: 0
Reputation: 16943
split URL by /
then urlencode()
that part then put it back together
$url = explode("/", $url);
$url[2] = idn_to_ascii($url[2]);
$url[5] = urlencode($url[5]);
$url = join("/", $url);
Upvotes: 1