bukzor
bukzor

Reputation: 38532

Is it standard to UTF8 + escape our international URLs?

I see that many sites (amazon, wikipedia, others) use UTF8-encoded, URL-escaped unicode in their URLs, and those URLs are prettified by (at least) Chrome.

For example, we would represent http://ja.wikipedia.org/wiki/メインページ as http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 when writing our http headers, and Chrome and Firefox seem to understand this in a graceful way. (I didn't test on IE.)

Is there a governing standard for this behavior? Or is it strictly a de facto standard? Or is it completely non-standard?

I'd really like to see a link to the defining paragraph of some RFC.

Upvotes: 2

Views: 400

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 598001

RFC 3987 is the new standard for handling International URI/URLs, known as IRIs. The old standard, RFC 3986, does not support Unicode. Anyone not using IRIs yet has to come up with their own way of encoding unsupported characters for their own needs. Percent-encoding UTF-8 octets is one way, but it is certainly not the only way that is actually in use.

Upvotes: 0

bukzor
bukzor

Reputation: 38532

The URI standard says:

When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded.

That seems pretty definitive.

I'm still unsure about when it was ratified, or the current browser support.

Upvotes: 1

Related Questions