overexchange
overexchange

Reputation: 1

Unicode escape sequence for non-BMP plane character

In java,

Unicode characters can be represented using unicode escape sequence for UTF-16 encoding. Below is an example that represents BMP plane character

char ch = '\u00A5'; // '¥'

Can surrogate pairs be used for non-BMP plane characters?

char ch4 = '\uD800\uDC00'; //Invalid character constant

How do I represent non-BMP plane character using java syntax?

Upvotes: 4

Views: 1160

Answers (2)

noraj
noraj

Reputation: 4622

To avoid writing surrogates pair for non-BMP chars and obtaining a String from a code point there are several methods.

String test1 = new String(new int[] { 0x1f4ae }, 0, 1);
String test2 = String.valueOf(Character.toChars(0x1f4ae));
String test3 = Character.toString(0x1f4ae):

Upvotes: 1

fge
fge

Reputation: 121720

You cannot do that with a single char constant, since a char is a UTF-16 code unit. You have to use a String constant, such as:

final String s = "\uXXXX\uYYYY";

where XXXX is the high surrogate and YYYY is the low surrogate.

Another solution is to use an int to store the code point; you can then use Character.toChars() to obtain a char[] out of it:

final int codePoint = 0x1f4ae; // for instance
final char[] toChars = Charater.toChars(codePoint);

Depending on what you use, you may also append code points directly (a StringBuilder has a method for that, for instance).

Upvotes: 9

Related Questions