Reputation: 20987
I'm trying to figure out what is valid for domain name registration, apparently some Unicode characters are translated weirdly while others do not at all.
This address:
http://xn--ippleman-dmj.com/
Translates to:
http://Nippleman.com/
and
http://xn--ggle-0nda.com/
should translate to:
http://gοοgle.com/
but for some reason the browser prevents it.
How is the format for these domains determined, and what is or isn't blocked by the browser?
http://xn--ippleman-dmj.com/
is a valid URL, while http://www.gοοgle.com
is not. Yet Chrome only replaces the Unicode on the second URL.
Upvotes: 2
Views: 5017
Reputation: 13166
First, to your question. The valid domain name must conform to RFC1035 regardless of browser, i.e. the whole domain name must not exceed 255 valid ASCII character (in octet) and it is case insensitive. Even IDN must comply with this standard. So to display IDN, RFC evolve come out with the Punycode 'xn--' conversion idea.
Then there is proof of concept of IDN homograph attack. Currently, Unicode.org update and maintain a confusable list. You can download current version TR39 and play around with it.
Previously, Chrome and firefox will translate domain name start with xn--
to correspondence Unicode found inside the browser font cache. If the browser can't find the font, it will display the raw 'xn--' punycode domain name.
This is known issues. Firefox even has manual option to enable/disable the Punycode domain name display. Google decide to remove the conversion post version 58+, while Firefox 53 will follow to make display Punycode as default.
I don't know whether Google will show Unicode(s) not inside TR39 or just remove the Punycode to Unicode conversion for all.
Upvotes: 1
Reputation:
It appears that you're trying to do an IDN homograph attack. The Wikipedia page nicely explains what Chrome is doing to stop you.
Upvotes: 2