Reputation: 34543
I'm under the impression that JavaScript interpreter assumes that the source code it is interpreting has already been normalized. What, exactly does the normalizing? It can't be the text editor, otherwise the plaintext representation of the source would change. Is there some "preprocessor" that does the normalization?
Upvotes: 15
Views: 16524
Reputation:
I've updated @bobince 's answer:
var cafe4= 'caf\u00E9';
var cafe5= 'cafe\u0301';
console.log (
cafe4+' '+cafe4.length, // café 4
cafe5+' '+cafe5.length, // café 5
cafe4 === cafe5, // false
cafe4.normalize() === cafe5.normalize() // true
);
Upvotes: 1
Reputation: 536775
No, there is no Unicode Normalization feature used automatically on—or even available to—JavaScript as per ECMAScript 5. All characters remain unchanged as their original code points, potentially in a non-Normal Form.
eg try:
<script type="text/javascript">
var a= 'café'; // caf\u00E9
var b= 'café'; // cafe\u0301
alert(a+' '+a.length); // café 4
alert(b+' '+b.length); // café 5
alert(a==b); // false
</script>
Update: ECMAScript 6 will introduce Unicode normalization for JavaScript strings.
Upvotes: 15
Reputation: 149804
ECMAScript 6 introduces String.prototype.normalize()
which takes care of Unicode normalization for you.
unorm is a JavaScript polyfill for this method, so that you can already use String.prototype.normalize()
today even though not a single engine supports it natively at the moment.
For more information on how and when to use Unicode normalization in JavaScript, see JavaScript has a Unicode problem – Accounting for lookalikes.
Upvotes: 18
Reputation: 86165
If you're using node.js
, there is a unorm
library for this.
https://github.com/walling/unorm
Upvotes: 11