UTF-8 difference between oracle and java

Question

I have the following unicode difference between an oracle database and Java.

If I run the following in oracle sql developer:

select unistr('\008C') from dual;

I get the following unicode character: http://www.utf8icons.com/character/140/control-character

However, if I try to perform the same type of unicode code to string conversion in java :

String s1 = new String("\u008C");

I get an empty char as a result.

I understand I could use the \u0152 character that displays the character I need in both java and oracle properly, but I would like to understand why I have this difference. I tried playing with my fonts but I did not get to any decent result. Thanks.

Alastair McCormack · Accepted Answer

This makes no sense:

String s1 = new String("\u008C".getBytes(), "UTF-8");

If you're lucky, your default encoding will be UTF-8 and you'll get:

s1.equals("\u008C") == true

this is because .getBytes() will default to your system encoding. You're effectively, encoding to an unknown (but discoverable) encoding and decoding from UTF-8.

If you're unlucky, your default encoding will be something else and you'll have emojibaked your string.

If what you meant to say was:

 System.out.println( "\u008C" );

produces nothing, it's because 'PARTIAL LINE BACKWARD' is a control character. i.e it's non-printing. It should never be printed. It would seem that some UI's render this character as 'LATIN CAPITAL LIGATURE OE' (U+0152) automatically and is dependent on implementation.

For example, if I copy create an HTML document with Œ in it, it displays in Chrome as Œ. Copy this char into your clipboard and paste into into a document and save it as UTF-16 BE. Hex dump the file and you will see:

0000000 01 52

The Unicode code point / UTF-16 encoding of 'LATIN CAPITAL LIGATURE OE'. Therefore, the Oracle SQL Developer tool is just deceiving/helping you by displaying 'LATIN CAPITAL LIGATURE OE' instead.

UTF-8 difference between oracle and java

Answers (2)

Related Questions