Milad R.e
Milad R.e

Reputation: 31

Arabic unicode or ASCII code in java

i want to find the character ASCII code for programming android to support the Arabic locale. Android programming has many characters are different English. The ASCII code in many letters joint or some of letters are split. how can i find the special code for each letter?

Upvotes: 1

Views: 3320

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109547

Unicode is a numbering of all characters. The numbering would need three bytes integers. A Unicode character is represented in science as U+XXXX where XXXX stands for the number in hexadecimal (base 16) notation. A Unicode character is called code point, in Java with type int.

Java char is 2 bytes (UTF-16), so cannot represent the higher order Unicode; there a pair of two chars is used.

The java class Character deals with conversion.

char lowUnicode = '\u0627'; // Alef, fitting in a char
int cp = (int) lowUnicode;

One can iterate through code points of a String as follows:

    String s = "...";
    for (int i = 0; i < s.length(); ) {
        int codePoint = s.codePointAt(i);
        i += Character.charCount(codePoint);
    }

    String s = "...";
    for (int i = 0; i < s.length(); ) {
        int codePoint = s.codePointAt(i);
        ...
        i += Character.charCount(codePoint);
    }

Or in java 8:

    s.codePoints().forEach(
        (codePoint) -> System.out.println(codePoint));

Dumping Arabic between U+600 and U+8FF:

The code below dumps Unicode in the main Arabic range.

for (int codePoint = 0x600; codePoint < 0x900; ++codePoint) {
    if (Character.isAlphabetic(codePoint)
            && UnicodeScript.of(codePoint) == UnicodeScript.ARABIC) {

        System.out.printf("\u200E\\%04X \u200F%s\u200E %s%n",
                codePoint,
                new String(Character.toChars(codePoint)),
                Character.getName(codePoint));
    }
}

Under Windows/Linux/... there exist char map tools to display Unicode. Above U+200E is the Left-To-Right, and U+200F is the Right-To-Left mark.

Upvotes: 4

Morteza
Morteza

Reputation: 550

If you want to get Unicode characters code below will do that:

char character = 'ع';
int code = (int) character;

Upvotes: 3

Related Questions