Reputation: 11324
In a PHP project I use the idn_to_utf8 function to convert domaine name from punycode to unicode string.
But sometimes this function return the punycode and not the unicode string.
Example :
echo idn_to_utf8('xn--fiq57vn0d561bf5ukfonh1o');
// Return : xn--fiq57vn0d561bf5ukfonh1o
// It should return : 中島第2駐輪場
echo idn_to_utf8('xn--fiqu6mnndw87c3ucbt0a1ea684a');
// Return : 中味鋺自転車置場
There are libraries which correctly convert punycode (http://idnaconv.phlymail.de/index.php?encoded=xn--fiq57vn0d561bf5ukfonh1o&decode=%3C%3C+Decode&lang=de) but I prefer use a PHP function than a library.
Do you have any ideas of origins of this problem ?
Edit / Solution and Explanation : To summarize and explain the problem : This code show the problem :
echo idn_to_ascii('吉津第2自転車置場');
?><br /><?php
echo idn_to_utf8(idn_to_ascii('吉津第2自転車置場'));
?> Should be : 吉津第2自転車置場 <br /><?php
This code displays the following :
xn--2-958a11kws1a96p50fgxenr6afga
吉津第2自転車置場 (Should be) : 吉津第2自転車置場
To be more clear : When we get the punycode of 吉津第2自転車置場, before convert this string PHP convert it to 吉津第2自転車置場 (The character "2" is different). So, with idn_to_ascii function we can't convert all unicode characters because PHP convert certain unicode character to others (in this example PHP converts 2 to 2 (sorry for sounding of this "two to "two").
Upvotes: 1
Views: 2047
Reputation: 356
Without PECL/intl or PECL/idn, I had trouble getting the built-in idn_to_utf8()
to work!
This alternative: IdnaConv.net, works well for me. Taking the domain name as a whole:
include(__DIR__.'/IdnaConvert.php');$IDNA=new \Mso\IdnaConvert\IdnaConvert();
$domain='xn--b1amarcd.xn--ehq889crwebw5c4qa.net';//'новини.三明治餐馆.net';
$parts=explode('.',$domain);$utf8parts=[];
foreach($parts AS $part){
if(\substr($part,0,4)==='xn--'){
$utf8parts[]=$IDNA->decode($part);
}else{
$utf8parts[]=$part;
} }
$utf8domain=implode('.',$utf8parts);
Upvotes: 0
Reputation: 5754
This works fine. I think characters [A-Z0-9]
cannot be used.
echo idn_to_utf8('xn--2-kq6aw43af1e4y9boczagup'); // 中島第2駐輪場
Factually, our chromes will automatically convert 中島第2駐輪場.com
into 中島第2駐輪場.com
before accessing.
UPDATED:
A normalization rule named NAMEPREP
seems to be provided: https://www.nic.ad.jp/ja/dom/idn.html
UPDATED:
That seems to be invaild...
Upvotes: 1