Reputation: 538
By default, String.prototype.normalize()
uses NFC
as an argument. NFC
replaces multiple characters with single one.
You can specify "NFC" to get the composed canonical form, in which multiple code points are replaced with single code points where possible.
And here's an example from MDN. It works.
let str = '\u006E\u0303';
str = str.normalize();
console.log(`${str}: ${str.length}`);
But then I decided to try this method with other characters. For example:
let str = '\u0057\u0303';
str = str.normalize();
console.log(`${str}: ${str.length}`);
What's wrong in the second example? Why doesn't it work?
Upvotes: 2
Views: 515
Reputation: 943615
It doesn't replace multiple characters it replaces multiple codepoints and only where possible.
ñ
, being a character used in Spanish has its own codepoint in unicode: — U+00D1 — so you can just say ñ
instead of "Take an n
and then put a ~
on top of it".
W̃
, being a representation of a phonic sound doesn't have its own codepoint. It is a character used comparatively rarely so hasn't been given precious space in the more efficient bits of Unicode. The only way you can have one is to say "Take a W
and then put a ~
on top of it".
Upvotes: 4