Ajay Jayavarapu
Ajay Jayavarapu

Reputation: 479

How to convert Telugu Characters into UTF-8 encoded characters in Java?

I have input character like this ఈ. For this character I need equivalent Hex entity "0C08" like this. Is there any inbuilt function in java for this?

Thanks in advance.

Upvotes: 1

Views: 3387

Answers (2)

bmargulies
bmargulies

Reputation: 100153

Java strings are UTF-16. To get UTF-8, you write something like:

String string = "SomethingInTeluguOrwhatever";
byte[] utf8Bytes = string.getBytes(Charsets.forName("utf-8"));

That gets you the UTF-8 values. If you want hex, iterate the bytes and print them in hex.

Upvotes: 0

krzydyn
krzydyn

Reputation: 1032

Characters in java are kept in unicode. So we need to specify encoding when reading/writing from/to external byte stream.

Note this code should print two the same lines on UTF-8 console:

String value = columnDetails.getColumnName();
System.out.println(value); //output with default encoding
System.out.write(value.getBytes("UTF-8"));//output with UTF-8

Edit: If you want hex representation of UTF-8 encoding, then try this:

//not optimized
String toHex(byte[] b) {
  String s="";
  for (int i=0; i<b.length; ++i) s+=String.format("%02X",b[i]&0xff);
  return s;
}
System.out.println(toHex("ఈ".getBytes("UTF-8"))); //prints E0B088

Edit2: or if you want Unicode (two byte representation)

static String toHex(String b) {
String s="";
for (int i=0; i<b.length(); ++i) s+=String.format("%04X",b.charAt(i)&0xffff);
    return s;
}
System.out.println(toHex("ఈ")); //prints 0C08

Upvotes: 1

Related Questions