Reputation: 2405
I know that for ASCII characters the URL encoding is just a percentage sign and a hex number that corresponds to the character. But for characters outside that range, hex encoding consists of two or more %hex-number sequences.
For example, for the character that corresponds to hex value 56CE, URL encoding, according to standard .net/java APIs is not %56CE but "%e5%9b%8e"
So if we know the hex value for a character outside the ASCII character range, how is the URL encoding calculated? In other words, how does e5, 9b, 8e come out of 56CE? I tried converting to binary and did see a pattern for the last 2 numbers (%9b, %8e) but have no idea where the %e5 comes from.
Upvotes: 1
Views: 218
Reputation: 595367
You have to encode the Unicode codepoints into charset bytes first, and then you can url-encode those bytes. In your example, E5 9B 8E
are the UTF-8 encoded bytes of Unicode codepoint U+56CE
, and then %E5%9B%8E
is the url encoded form of the UTF-8 bytes.
Upvotes: 2