user2064000
user2064000

Reputation:

Convert unicode URL to ASCII

I'm writing a PHP application that accepts an URL from the user, and then processes it with by making some calls to binaries with system()*. However, to avoid many complications that arise with this, I'm trying to convert the URL, which may contain Unicode characters, into ASCII characters.

Let's say I have the following URL:

https://täst.de:8118/news/zh-cn/新闻动态/2015/

Here two parts need to be dealt with: the hostname and the path.

How should I approach this problem?


*I'd rather not deal with system() calls and the resulting complexity, but given that the functionality is only available by calling binaries, I unfortunately have no choice.

Upvotes: 3

Views: 823

Answers (3)

user2064000
user2064000

Reputation:

The following can be used for this transformation:

function convertpath ($path) {
  $path1 = '';
  $len = strlen ($path);
  for ($i = 0; $i < $len; $i++) {
     if (preg_match ('/^[A-Za-z0-9\/?=+%_.~-]$/', $path[$i])) {
       $path1 .= $path[$i];
     }
     else {
       $path1 .= urlencode ($path[$i]);
     }
  }
  return $path1;
}

Upvotes: 0

Maltronic
Maltronic

Reputation: 1802

You could use PHP's iconv function:

inconv("UTF-8", "ASCII//TRANSLIT", $url);

Upvotes: 0

Peter
Peter

Reputation: 16943

split URL by / then urlencode() that part then put it back together

$url = explode("/", $url);
$url[2] = idn_to_ascii($url[2]);
$url[5] = urlencode($url[5]);
$url = join("/", $url);

Upvotes: 1

Related Questions