Chad Harrison
Chad Harrison

Reputation: 2858

What is the default character encoding for HTML?

For some reason, the plain text character on the html side is being dsiplayed as –. The only thing I can think that would be attributed to this is the character encoding. My guess is that it's utf-8, but not sure how I am getting the weird characters. Is there an explanation?

What I mean by default is if the charset isn't specified.

Upvotes: 9

Views: 5436

Answers (3)

Jon Hanna
Jon Hanna

Reputation: 113242

That certainly looks like UTF-8 being interpreted as something else.

HTML doesn't have a default. It's picked up from the headers of the transfer protocol (normally HTTP) or failing that, from a BOM, from meta elements or, in the case of XHTML, the XML declaration. In the absence of any of those, the user-agent guesses.

HTTP has a default of ISO-8859-1, which even one HTML spec described as having "proved useless" [source] (they don't even go into the fact that a large amount of stuff out there labelled as ISO-8859-1 is actually CP-1252).

Hence. Forget about defaults, always set your HTTP headers and your meta elements (in case it's saved as a file).

And always do so as UTF-8. Anything else in this day and age is just an act of masochism.

Upvotes: 10

powerbuoy
powerbuoy

Reputation: 12838

The !DOCTYPE doesn't set a character encoding, the meta element together with the (newly standardized) charset attribute does. If it's absent I'm not entirely sure how the browser determines the encoding.

I believe the problem you're having though is that your page is saved in one encoding and served in another.

Just make sure you set <meta charset="utf8"/> and make sure your document is in fact utf8 and it should work.

Upvotes: 4

LazyZebra
LazyZebra

Reputation: 1113

i use the default that Eclipse for PHP provides with, and face no problems:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>

Upvotes: -4

Related Questions