Gray
Gray

Reputation: 2333

What character encoding are URLs supposed to be in?

I've seen some browsers encode (for example) é as %E9 (ISO-8859-1) or %C3%A9 (UTF-8) and I'm not sure which one is correct. Is there a way to see what encoding that the request is intended to be interpreted as?

Upvotes: 1

Views: 102

Answers (2)

unor
unor

Reputation: 96727

The URI scheme decides how the URI should be encoded.

For new URI schemes, UTF-8 is recommended:

Unless there is some compelling reason for a particular scheme to do otherwise, translating character sequences into UTF-8 (RFC 2279) and then subsequently using the %HH encoding for unsafe octets is recommended.

http and https do not specify how URIs should be encoded.

For IRIs, the corresponding URI has to be encoded in UTF-8:

The URI corresponding to the IRI in question has to encode original characters into octets using UTF-8.

Upvotes: 1

Julian Reschke
Julian Reschke

Reputation: 42065

No. It's entirely up to the server to decide how to encode non-ASCII characters. Unfortunately.

Upvotes: 3

Related Questions