Reputation: 22422
I understand that ES6 will have a new function that will do a utf-8 normalization of a string (using the 'NFC' form for example).
Reading http://www.unicode.org/faq/normalization.html, I saw this FAQ :
Q: What is the difference is between W3C normalization and Unicode normalization?
A: Unicode normalization comes in 4 flavors: C, D, KC, KD. It is C that is relevant for W3C normalization. W3C normalization also treats character references (&#nnnn;) as equivalent to characters. For example, the text string "a&#xnnnn;" (where nnnn = "0301") is Unicode-normalized since it consists only of ASCII characters, but it is not W3C-normalized, since it contains a representation of a combining acute accent with "a", and in normalization form C, that should have been normalized to U+00E1.
does that mean that we will need to replace all occurrences of &#xnnnn; by their utf8 equivalents before calling normalize('nfc') ?
or will there be some sort of normalize('w3c') that will help consider a letter combined with an accent via the ascii "&#xnnnn;" equivalent to its normalized form ?
Upvotes: 3
Views: 918
Reputation: 5787
When your javascript executes the &...; is already gone, if you handle the DOM. The only time you would see that is if you download and html somehow. And, anyway, converting the &...; to the proper character is un-escaping, not normalization. So you would have to un-escape, then normalize.
Upvotes: 1