user1589188
user1589188

Reputation: 5736

Finding the equivalent Unicode codepoint given an extended ASCII codepoint and a codepage in Java?

I am trying to write a method to find the equivalent codepoint in Unicode of the same visual character in ASCII given a specific codepage

For example, given a character say char c = 128, which is '€' in Windows-1252 codepage, running the method

int result = asUnicode(c, "windows-1252")

should gives 8364 or for the same char c = 128, which is 'Ђ' in Windows-1251 codepage, running the method

int result = asUnicode(c, "windows-1251")

should gives 1026

How this can be done in Java?

Upvotes: 1

Views: 176

Answers (1)

Savior
Savior

Reputation: 3531

c shouldn't really be a char, but a byte[] of bytes in the corresponding encoding, eg. windows-1252.

For this simple case, we can just wrap the char into a byte[] ourselves.

You need to decode those bytes to Java's char type which represents BMP code points. Then you return the corresponding one.

public static int asUnicode(char c, String charset) throws Exception {
    CharBuffer result = Charset.forName(charset).decode(ByteBuffer.wrap(new byte[] { (byte) c }));
    int unicode;
    char first = result.get();
    if (Character.isSurrogate(first)) {
        unicode = Character.toCodePoint(first, result.get());
    } else {
        unicode = first;
    }
    return unicode;
}

The following

public static void main(String[] args) throws Exception {
    char c = 128;
    System.out.println(asUnicode(c, "windows-1252"));
    System.out.println(asUnicode(c, "windows-1251"));
}

prints

8364
1026

Upvotes: 2

Related Questions