Reputation: 10799
I am working with a large number of HTML files that are mostly encoded as utf-8. There are accented characters galore as many are in French. I have been converting them to HTML entities as I go, but I noticed that even in IE5.5 (according IE tester) the nonconverted accented characters are displaying properly.
Should I be concerned with character display and convert them all to HTML entities just to be on the safe side?
Upvotes: 17
Views: 7373
Reputation: 201528
There is normally no reason to use entities for characters like accented letters. Using them is valid but tends to obfuscate the source code and may therefore cause errors.
However, in some cases the entities are needed. The reasons are not related to browsers but to the authoring side. In particular, if you need to edit the files using an editor or an authoring program that does not handle accented letters well, you may find entities useful. The same applies if the data has to pass through some software that has similar problems. And in some cases, you need to work within an environment where you have no control over HTTP headers and the headers specify an encoding that does not let you enter all characters directly.
Upvotes: 5
Reputation: 498904
If the files are UTF-8 encoded, you should set the Content-Type
header to be text/html; charset=UTF-8
and have an equivalent meta tag on the page:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This gives the browser all the information for displaying UTF-8 characters correctly. There is no need to encode accented characters.
Upvotes: 18
Reputation: 1888
The thing you need to remember is French is part of the UTF-8 family along with Portuguese, Spanish, etc, so they will display properly with a UTF-8 tag in place and providing the browser is also using UTF-8 for the page.
The problem is when a person using a browser that is forcing another charset comes to the page, this will break the un-encoded characters. This happens a bit here in Brazil where many browsers are not set for automatic detection of the charset and are set to ISO-8859-1 that is common here.
So where possible encode all of your "special" characters for the most universal access possible.
I hope that helps!
Upvotes: 2