Jslater2001
Jslater2001

Reputation: 21

How to find the surrogate pair of a symbol in java

I'm trying to get the and operation(⋀) symbol to appear, but the symbol's unicode value is u+22C1(This may be wrong, but according to what I've read it's this). There's another value I found being 2227 but that prints ࢳ. If you can please explain the method to finding the surrogate pair as I have to find to find it for a bunch more symbols.

Upvotes: 0

Views: 944

Answers (1)

skomisa
skomisa

Reputation: 17383

Finding the surrogate pair for a symbol in Java is straightforward, with a couple of caveats:

  • Not all characters are represented by a surrogate pair, including your example ("⋀"), so always check for this before attempting to get the surrogate pair.
  • You need a font that can display the symbols, both in your source code, and in any output produced by your code. I used Monospaced in NetBeans for the code and output shown below.

Here is code to display some basic Unicode information for arbitrary symbols. It gets the code point for a symbol, determine whether it is represented by a surrogate pair, and if so gets the high and low surrogates. The code processes two symbols, one with a surrogate pair (the emoji "😊"), and one without (your "⋀" example).

package surrogates;

public class Surrogates {

    public static void main(String[] args) {
        Surrogates.displaySymbolDetails("😊️️");
        Surrogates.displaySymbolDetails("⋀️️");
    }

    static void displaySymbolDetails(String symbol) {
        int cp = symbol.codePointAt(0);
        String name = Character.getName(cp);
        System.out.println(symbol + " has code point " + cp + " (hex " + Integer.toHexString(cp) + ").");
        System.out.println(symbol + " has Unicode name " + name + ".");
        boolean isSupplemenetary = Character.isSupplementaryCodePoint(cp);
        if (isSupplemenetary) {
            System.out.println(symbol + " is a supplementary character.");
            char high = Character.highSurrogate​(cp);
            char low = Character.lowSurrogate​(cp);
            System.out.println(symbol + " has high surrogate: " + (int) high + ".");
            System.out.println(symbol + " has low surrogate: " + (int) low + ".");
        } else {
            System.out.println(symbol + " is in the BMP and therefore is not represented by a surrogate pair.");
        }
    }
}

Here is the output:

😊️️ has code point 128522 (hex 1f60a).
😊️️ has Unicode name SMILING FACE WITH SMILING EYES.
😊️️ is a supplementary character.
😊️️ has high surrogate: 55357.
😊️️ has low surrogate: 56842.
⋀️️ has code point 8896 (hex 22c0).
⋀️️ has Unicode name N-ARY LOGICAL AND.
⋀️️ is in the BMP and therefore is not represented by a surrogate pair.

Notes:

  • "symbol" can mean multiple things, but I am assuming that in your question you are simply referring to some Unicode character.
  • Symbols (i.e Unicode characters) in the basic multilingual plane (BMP) are not represented by a surrogate pair. All other symbols are in some supplementary plane (SMP), and are represented by a surrogate pair.

Upvotes: 4

Related Questions