Reputation: 9484
In JSON, Unicode characters can be escaped using the \uXXXX
notation. I assume the XXXX
obviously refers to a Unicode code point in hexadecimal.
But since there are only 4 digits, does this mean there is no way to escape codepoints which are > 0xFFFF
?
Or does the \uXXXX
not actually encode abstract code points, but actually units of UTF-16-BE encoded bytes?
Upvotes: 4
Views: 2232
Reputation: 14345
Update 4/28/2024
It has for some time been possible to use sequences like \u{X}
or \u{XXXXXX}
to represent code points, including those greater than 0xFFFF.
var s = '\u{2f804}';
alert(s + '::' + s.length); // 你::2
It should be \uXXXX
and yes, it is possible to represent characters greater than 0xFFFF using high and low surrogates along the lines you mention.
var s = '\uD87E\uDC04';
alert(s + '::' + s.length); // 你::2
Upvotes: 3