HUA Di
HUA Di

Reputation: 901

What's the behavior the browser encoding URL?

I'm doing a test, how the Firefox encoding character.

But the fact confused me.

HTML code:

<html lang="zh_CN">
<head>
<title>some Chinese character</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<img src="http://localhost/xxx" />
</body>

The xxx is some Chinese characters. These character must be encode into format like %xx to transport by HTTP.

First, I encoding the source file in UTF-8. use firefox to open the html file. The img label will send a request, "xxx" character were encoded by UTF8.

I changed the meta into <meta http-equiv="Content-Type" content="text/html; charset=gbk"> but nothing changed.

Second, I save the source file in ANSI, maybe GBK or GB2312.

when the charset=gbk, still encoding the character by UTF8.

BUT, when the charset=utf8, the characters were encoding by GBK. By the way, other Chinese character can't display in right way, e.g. the String in title.

How to control the browser's encoding behavior?

Upvotes: 1

Views: 2737

Answers (1)

Esailija
Esailija

Reputation: 140236

UTF-8 is the standard for URL encoding. If you encode your source file physically in GBK, but use utf-8 in the content-type, you are just lying to the browser and will get inconsistent or non-working results.

When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2

Upvotes: 2

Related Questions