gilly3
gilly3

Reputation: 91467

Expressing UTF-16 unicode characters in JavaScript

To express, for example, the character U+10400 in JavaScript, I use "\uD801\uDC00" or String.fromCharCode(0xD801) + String.fromCharCode(0xDC00). How do I figure that out for a given unicode character? I want the following:

var char = getUnicodeCharacter(0x10400);

How do I find 0xD801 and 0xDC00 from 0x10400?

Upvotes: 15

Views: 18088

Answers (2)

Mathias Bynens
Mathias Bynens

Reputation: 149504

How do I find 0xD801 and 0xDC00 from 0x10400?

JavaScript uses UCS-2 internally. That’s why String#charCodeAt() doesn’t work the way you’d want it to.

If you want to get the code point of every Unicode character (including non-BMP characters) in a string, you could use Punycode.js’s utility functions to convert between UCS-2 strings and UTF-16 code points:

// String#charCodeAt() replacement that only considers full Unicode characters
punycode.ucs2.decode('𝌆'); // [119558]
punycode.ucs2.decode('abc'); // [97, 98, 99]

If you don’t need to do it programmatically though, and you’ve already got the character, just use mothereff.in/js-escapes. It will tell you how to escape any character in JavaScript.

Upvotes: 5

Arnaud Le Blanc
Arnaud Le Blanc

Reputation: 99889

Based on the wikipedia article given by Henning Makholm, the following function will return the correct character for a code point:

function getUnicodeCharacter(cp) {

    if (cp >= 0 && cp <= 0xD7FF || cp >= 0xE000 && cp <= 0xFFFF) {
        return String.fromCharCode(cp);
    } else if (cp >= 0x10000 && cp <= 0x10FFFF) {

        // we substract 0x10000 from cp to get a 20-bits number
        // in the range 0..0xFFFF
        cp -= 0x10000;

        // we add 0xD800 to the number formed by the first 10 bits
        // to give the first byte
        var first = ((0xffc00 & cp) >> 10) + 0xD800

        // we add 0xDC00 to the number formed by the low 10 bits
        // to give the second byte
        var second = (0x3ff & cp) + 0xDC00;

        return String.fromCharCode(first) + String.fromCharCode(second);
    }
}

Upvotes: 17

Related Questions