firebird
firebird

Reputation: 3511

How come the following characters are displayed in ISO-8859-1?

I have the following html:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
    会意字 / 會意字 huìyìzì
</body>

When I run it in firefox, it displays the Chinese characters just fine. How come it works with the ISO-8859-1 characterset? I thought you needed UTF-8?

Upvotes: 0

Views: 60

Answers (2)

Jukka K. Korpela
Jukka K. Korpela

Reputation: 201588

The most probable explanation is that the document is in fact UTF-8 encoded and the browser treats it that way, despite the meta tag. According to HTML5 encoding sniffing algorithm, which largely reflects browser behavior, the meta tag is ignored if any of the following is true:

  • The user has instructed (via e.g. a View → Encoding command) the browser to use a specific encoding.
  • The page starts with bytes that represent the Byte Order Mark in UTF-8 or UTF-16. In practice, it starts that way if the file was saved in an editor with a command like “Save as UTF-8 (with BOM)”.
  • HTTP headers specify an encoding in a Content-Type header.

You can find out which of these is the cause by using e.g. Rex Swain’s HTTP viewer. It lets you see both the HTTP response headers and the actual data as bytes. Developer Tools in browsers have similar features.

Upvotes: 1

Quentin
Quentin

Reputation: 943579

I can't reproduce your successful rendering:

Not rendering correctly in Firefox

… but HTML 5 defines a fairly complex character encoding detection method which doesn't pay any attention to <meta> until step 9.

In general, you should avoid encodings other than UTF-8 and definitely should not lie about the encoding of the document.

Upvotes: 1

Related Questions