dibugger
dibugger

Reputation: 576

How to decode emoji (unicode) in HBase using Java API?

In my HBase table, there are some encoded emoji, like \xF0\x9F\x8C\x8F and \xE2\x9A\xBE. I am trying to use Bytes.toString() to decode them. However, this method use utf-8 which can only decode three bytes code like \xE2\x9A\xBE and the four bytes code like \xF0\x9F\x8C\x8F appears to be a question mark (see below). So how can I decode the four bytes code to emoji and print them out? Anybody has an idea? Thanks in advance!

Example:

The result should be:enter image description here

But I got enter image description here

I am so sorry that I forgot to mention that I am using servlet to query HBase and write the content to response.

Upvotes: 0

Views: 1414

Answers (1)

Japu_D_Cret
Japu_D_Cret

Reputation: 638

When I read a file that contains the following character "🌏"(F09F8C8F or U+1F30F) and it has a BOM which indicates UTF-8 encoding and I correctly convert it to UTF-8 by using

byte[] encoded = Files.readAllBytes(selectedFile.toPath());
String fileContents = new String(encoded, StandardCharsets.UTF_8);

the resulting String is correctly converted and correctly displayed in my Java Swing application. But if I print the same String to the console I get a boxed question mark instead of the symbol. So the character is correctly converted, but it's just your output that gets it messed up.

To recreate this, you can use this:

public static void main(String[] args) throws Exception {
  byte[] encoded = { (byte) 0xF0, (byte) 0x9F, (byte) 0x8C, (byte) 0x8F };
  String convertedstring = new String(encoded, StandardCharsets.UTF_8);

  System.out.println("convertedstring: " + convertedstring);

  JDialog dialog = new JDialog();
  dialog.setSize(300, 100);
  dialog.setLocationRelativeTo(null);
  dialog.setTitle("encoding-test");
  dialog.setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
  JLabel label = new JLabel("convertedstring: " + convertedstring);
  dialog.add(label);

  dialog.setVisible(true);
}

Console Output

enter image description here

JDialog Output

enter image description here

you might also wanna see Default character encoding for java console output and Java, UTF-8, and Windows console

Upvotes: 1

Related Questions