Jorge Sainz
Jorge Sainz

Reputation: 133

strange utf8 char behavior in blogger using javascript

I'm using javascript in blogger but i'm having a strange behavior using the chars '¡' and '?'. For example, the following code will show ñéúüëò¡!¿? inside the div but ñéúüëò¡!¿? as alert message.

<div id="test">
    </div>
    <script>
    (function () {
      document.getElementById("test").innerHTML = 'ñéúüëò¡!¿?';
      alert('ñéúüëò¡!¿?');
    })();
    </script>
</pre>

If we look at the generated code, we can see that the javascript tag has been converted to:

<script>
(function () {
document.getElementById("test").innerHTML = 'ñéúüëò&#161;!&#191;?';
alert('ñéúüëò&#161;!&#191;?');
})();
</script>

However, I can use an external js

<div id="test">
</div>
<script src="http://foo.bar/file.js"></script>

Being the js file (utf8 encoded):

document.getElementById("test").innerHTML = 'ñéúüëò¡!¿?';
alert('ñéúüëò¡!¿?');

An the result is the expected one: ñéúüëò¡!¿? inside the div and ñéúüëò¡!¿? as alert message.

Still more weird, I can write the following code in blogger, which leads to the wanted behavior, even if it's not too clean:

<div id="div1" style="display:none">¡</div>
<div id="div2" style="display:none">¿</div>

<div id="test">
</div>

<script>
(function () {
  document.getElementById("test").innerHTML = 'ñéúüëò¡!¿?';
  alert('ñéúüëò'+ document.getElementById('div1').innerHTML + '!' + document.getElementById('div2').innerHTML +'?');
})();
</script>

Can someone explains me how could I write proper and clean code to solve this without using external js files?

Upvotes: 0

Views: 196

Answers (1)

Mike Samuel
Mike Samuel

Reputation: 120506

'ñéúüëò¡!¿?'

can be written

'ñéúüëò\u00a1!\u00bf?'

which might make it past whatever over-escaping of script element bodies is happening or alternatively

 '\u00f1\u00e9\u00fa\u00fc\u00eb\u00f2\u00a1\u0021\u00bf\u003f'

which contains only 7-bit ASCII code-points, so is less likely to run into character set confusion or over-zealous escapers.

\u00f1 encodes unicode code-point 241. More generally \u followed by 4-hexadecimal digits encodes the code-point whose integer value is specified by the 4 hex digits.

Upvotes: 1

Related Questions