Reputation: 2907
I have a page with UTF-8 header:
<meta charset="utf-8" />
And in the page I use the umbraco dictionary to fetch content in various languages. When I print this in German on the page it appears fine:
<h1>@library.GetDictionaryItem("A")</h1>
resolves to:
<h1>Ä</h1>
in German
However if I enter it via a script:
<script type="text/javascript" charset="utf-8">
var a = "@library.GetDictionaryItem("A")";
alert(a);
</script>
The alert prints:
ä
If I do
<script type="text/javascript" charset="utf-8">
var a = "Ä";
alert(a);
</script>
The alert prints:
Ä
So what could explain this behaviour and how can I fix the alert? As far as I can see everything is UTF-8 and the dictionary and the page encoding is fine. The problem happens within Javascript.
From what I can see from the table here, Javascript resolves the character into it's Numeric value. I used "escape, encodeUrl, decodeUrl" etc with no luck.
chr HexCode Numeric HTML entity escape(chr) encodeURI(chr)
ä \xE4 ä ä %E4 %C3%A4
Upvotes: 1
Views: 11957
Reputation: 1074028
(FWIW: Character entity ä
is ä
, not Ä
.)
This has nothing to do with character encoding. You're outputting an HTML entity to a JavaScript string, and then asking the browser to display that JavaScript string without doing anything to interpret HTML (via alert
). It's exactly as though you actually typed:
<h1>ä</h1>
...(which will show ä
on the page), and
<script>
var a = "ä";
alert(a);
</script>
...which won't. The HTML entity isn't being used anywhere that understands HTML entities. alert
doesn't interpret HTML.
But if you did this:
<script>
var a = "ä";
var div = document.createElement('div');
div.innerHTML = a;
document.body.appendChild(div);
</script>
...you'd see the character on the page, because we're giving the entity to something (innerHTML
) that will interpret HTML. And so if you make that first line:
var a = "@library.GetDictionaryItem("A")";
...and then use a
in an HTML context (as above), you'll get the ä
in the document.
If you always get a decimal numeric character entity (like ä
) from Umbraco, since those define unicode code points and JavaScript (mostly) uses unicode code points in its strings*, you can parse the entity easily enough:
function characterFromDecimalNumericEntity(str) {
var decNumEntRex = /^\&#(\d+);$/;
var match = decNumEntRex.exec(str);
var codepoint = match ? parseInt(match[1], 10) : null;
var character = codepoint ? String.fromCharCode(codepoint) : null;
return character;
}
alert(characterFromDecimalNumericEntity("ä")); // ä
* Why "mostly": JavaScript strings are made up of 16-bit "characters" that correspond to UTF-16 code units, not Unicode code points (you can't store a Unicode code point in 16 bits, you need 21). All characters from the Basic Multilingual Plane fit within one UTF-16 code unit, but characters from the Supplementary Multilingual Plane, Supplementary Ideographic Plane, and so on require two UTF-16 code units for a character. One of those characters will occupy two "characters" in a JavaScript string. The function above would fail for them. More in the JavaScript spec and the Unicode FAQ.
Upvotes: 3