Saurabh
Saurabh

Reputation: 2472

Get unicode value of a character

Is there any way in Java so that I can get Unicode equivalent of any character? e.g.

Suppose a method getUnicode(char c). A call getUnicode('÷') should return \u00f7.

Upvotes: 82

Views: 200101

Answers (6)

Deepak Sharma
Deepak Sharma

Reputation: 476

char c = 'a';
String a = Integer.toHexString(c); // gives you---> a = "61"

Upvotes: 10

Jordan Doerksen
Jordan Doerksen

Reputation: 46

are you picky with using Unicode because with java its more simple if you write your program to use "dec" value or (HTML-Code) then you can simply cast data types between char and int

char a = 98;
char b = 'b';
char c = (char) (b+0002);

System.out.println(a);
System.out.println((int)b);
System.out.println((int)c);
System.out.println(c);

Gives this output

b
98
100
d

Upvotes: 1

Yogesh Dubey
Yogesh Dubey

Reputation: 173

private static String toUnicode(char ch) {
    return String.format("\\u%04x", (int) ch);
}

Upvotes: 14

Aaron Digulla
Aaron Digulla

Reputation: 328840

If you have Java 5, use char c = ...; String s = String.format ("\\u%04x", (int)c);

If your source isn't a Unicode character (char) but a String, you must use charAt(index) to get the Unicode character at position index.

Don't use codePointAt(index) because that will return 24bit values (full Unicode) which can't be represented with just 4 hex digits (it needs 6). See the docs for an explanation.

[EDIT] To make it clear: This answer doesn't use Unicode but the method which Java uses to represent Unicode characters (i.e. surrogate pairs) since char is 16bit and Unicode is 24bit. The question should be: "How can I convert char to a 4-digit hex number", since it's not (really) about Unicode.

Upvotes: 42

SyntaxT3rr0r
SyntaxT3rr0r

Reputation: 28344

You can do it for any Java char using the one liner here:

System.out.println( "\\u" + Integer.toHexString('÷' | 0x10000).substring(1) );

But it's only going to work for the Unicode characters up to Unicode 3.0, which is why I precised you could do it for any Java char.

Because Java was designed way before Unicode 3.1 came and hence Java's char primitive is inadequate to represent Unicode 3.1 and up: there's not a "one Unicode character to one Java char" mapping anymore (instead a monstrous hack is used).

So you really have to check your requirements here: do you need to support Java char or any possible Unicode character?

Upvotes: 77

Chathuranga Chandrasekara
Chathuranga Chandrasekara

Reputation: 20956

I found this nice code on web.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

public class Unicode {

public static void main(String[] args) {
System.out.println("Use CTRL+C to quite to program.");

// Create the reader for reading in the text typed in the console. 
InputStreamReader inputStreamReader = new InputStreamReader(System.in);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);

try {
  String line = null;
  while ((line = bufferedReader.readLine()).length() > 0) {
    for (int index = 0; index < line.length(); index++) {

      // Convert the integer to a hexadecimal code.
      String hexCode = Integer.toHexString(line.codePointAt(index)).toUpperCase();


      // but the it must be a four number value.
      String hexCodeWithAllLeadingZeros = "0000" + hexCode;
      String hexCodeWithLeadingZeros = hexCodeWithAllLeadingZeros.substring(hexCodeWithAllLeadingZeros.length()-4);

      System.out.println("\\u" + hexCodeWithLeadingZeros);
    }

  }
} catch (IOException ioException) {
       ioException.printStackTrace();
  }
 }
}

Original Article

Upvotes: 1

Related Questions