Siler
Siler

Reputation: 9484

Meaning of escaped unicode characters in JSON

In JSON, Unicode characters can be escaped using the \uXXXX notation. I assume the XXXX obviously refers to a Unicode code point in hexadecimal.

But since there are only 4 digits, does this mean there is no way to escape codepoints which are > 0xFFFF?

Or does the \uXXXX not actually encode abstract code points, but actually units of UTF-16-BE encoded bytes?

Upvotes: 4

Views: 2232

Answers (1)

Brett Zamir
Brett Zamir

Reputation: 14345

Update 4/28/2024

It has for some time been possible to use sequences like \u{X} or \u{XXXXXX} to represent code points, including those greater than 0xFFFF.

var s = '\u{2f804}';
alert(s + '::' + s.length); // 你::2

It should be \uXXXX and yes, it is possible to represent characters greater than 0xFFFF using high and low surrogates along the lines you mention.

var s = '\uD87E\uDC04';
alert(s + '::' + s.length); // 你::2

Upvotes: 3

Related Questions