Reputation: 538
I tried to use normalize('NFKC')
method with different characters, but it didn't work. Fortunately, can't say this for NFC
. When it's possible normalize('NFC')
always replaces multiple codepoints with the single one. For example:
let t1 = `\u00F4`; //ô
let t2 = `\u006F\u0302`; //ô
console.log(t2.normalize('NFC') == t1); //true
And here's example with NFKC
that never works:
let s1 = '\uFB00'; //"ff"
let s2 = '\u0066\u0066'; //"ff"
console.log(s2.normalize('NFKC') == s1); //false
I thought before that NFKC
replaces multiple codepoints with the single one that represents compatible character. To put it simple, I thought that NFKC
will replace \u0066\u0066
with \uFB00
.
If NFKC
doesn't work like that, then... how does it work?
Upvotes: 3
Views: 3017
Reputation: 538
The thing is NFKC
(as well as NFKD
) supports compatible and canonically equivalent normalization.
The type of full decomposition chosen depends on which Unicode Normalization Form is involved. For NFC or NFD, one does a full canonical decomposition, which makes use of only canonical Decomposition_Mapping values. For NFKC or NFKD, one does a full compatibility decomposition, which makes use of canonical and compatibility Decomposition_Mapping values.
And that's completely understandable because as MDN says:
All canonically equivalent sequences are also compatible, but not vice versa.
But it's also worth to notice that NFKC
makes compatible and canonically equivalent normalizations in different ways. Canonically equivalent normalization by NFKC
is produced the same way as NFC
. For example:
//"ô" (U+00F4) -> "a" (U+006F) + " ̂" (U+0302) -> "â" (U+00F4)
let c1 = `\u006F\u0302`; //ô
console.log(c1.normalize('NFKC').length); //1
But compatible normalization by this parameter works differently. The spec is saying:
Normalization Form KC does not attempt to map character sequences to compatibility composites. For example, a compatibility composition of “office” does not produce “o\uFB03ce”, even though “\uFB03” is a character that is the compatibility equivalent of the sequence of three characters “ffi”. In other words, the composition phase of NFC and NFKC are the same—only their decomposition phase differs, with NFKC applying compatibility decompositions.
For example:
//"ff"(U+FB00) -> "f"(U+0066) + "i"(U+0066) -> "f"(U+0066) + "i"(U+0066)
let c2 = '\u0066\u0066'; //ff
console.log(c2.normalize('NFKC').length); //2
Upvotes: 6