Reputation: 576
In my HBase table, there are some encoded emoji, like \xF0\x9F\x8C\x8F and \xE2\x9A\xBE. I am trying to use Bytes.toString() to decode them. However, this method use utf-8 which can only decode three bytes code like \xE2\x9A\xBE and the four bytes code like \xF0\x9F\x8C\x8F appears to be a question mark (see below). So how can I decode the four bytes code to emoji and print them out? Anybody has an idea? Thanks in advance!
Example:
I am so sorry that I forgot to mention that I am using servlet to query HBase and write the content to response.
Upvotes: 0
Views: 1414
Reputation: 638
When I read a file that contains the following character "🌏"(F09F8C8F or U+1F30F) and it has a BOM which indicates UTF-8 encoding and I correctly convert it to UTF-8 by using
byte[] encoded = Files.readAllBytes(selectedFile.toPath());
String fileContents = new String(encoded, StandardCharsets.UTF_8);
the resulting String is correctly converted and correctly displayed in my Java Swing application. But if I print the same String to the console I get a boxed question mark instead of the symbol. So the character is correctly converted, but it's just your output that gets it messed up.
To recreate this, you can use this:
public static void main(String[] args) throws Exception {
byte[] encoded = { (byte) 0xF0, (byte) 0x9F, (byte) 0x8C, (byte) 0x8F };
String convertedstring = new String(encoded, StandardCharsets.UTF_8);
System.out.println("convertedstring: " + convertedstring);
JDialog dialog = new JDialog();
dialog.setSize(300, 100);
dialog.setLocationRelativeTo(null);
dialog.setTitle("encoding-test");
dialog.setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
JLabel label = new JLabel("convertedstring: " + convertedstring);
dialog.add(label);
dialog.setVisible(true);
}
Console Output
JDialog Output
you might also wanna see Default character encoding for java console output and Java, UTF-8, and Windows console
Upvotes: 1