Pradeep Bhadani
Pradeep Bhadani

Reputation: 4751

Convert UTF-8 encoded string to human readable string

How to convert any UTF8 strings to readable strings.

Like : ⬠(in UTF8) is €

I tried using Charset but not working.

Upvotes: 1

Views: 15831

Answers (5)

PbxMan
PbxMan

Reputation: 7623

You are trying to decode a byteArray encoded with "ISO-8859-15" with "UTF-8" format

        b = "Üü?öäABC".getBytes("ISO-8859-15");
        u = "Üü?öäABC".getBytes("UTF-8");

    System.out.println(new String(b, "ISO-8859-15")); // will be ok
    System.out.println(new String(b, "UTF-8")); // will look garbled
    System.out.println(new String(u,"UTF-8")); // will be ok

Upvotes: 1

user684934
user684934

Reputation:

I think the problem here is that you're assuming a java String is encoded with whatever you've specified in the constructor. It's not. It's in UTF-16.

So, "Üü?öäABC".getBytes("ISO-8859-15") is actually converting a UTF-16 string to ISO-8859-15, and then getting the byte representation of that.

If you want to get the human-readable format in your Eclipse console, just keep it as it is (in UTF-16) - and call System.out.println("Üü?öäABC"), because your Eclipse console will decode the string and display it as UTF-16.

Upvotes: 0

Grim
Grim

Reputation: 1648

A string in java is already an unicode representation. When you call one of the getBytes methods on it you get an encoded representation (as bytes, thus binary values) in a specific encoding - ISO-8859-15 in your example. If you want to convert this byte array back to an unicode string you can do that with one of the string constructors accepting a byte array, like you did, but you must do so using the exact same encoding the byte array was originally generated with. Only then you can convert it back to an unicode string (which has no encoding, and doesn't need one).

Beware of the encoding-less methods, both the string constructor and the getBytes method, since they use the default encoding of the platform the code is running on, which might not be what you want to achieve.

Upvotes: 1

Esailija
Esailija

Reputation: 140234

This is not "UTF-8" but completely broken and unrepairable data. Strings do not have encodings. It makes no sense to say "UTF-8" string in this context. String is a string of abstract characters - it doesn't have any encodings except as an internal implementation detail that is not our concern and not related to your problem.

Upvotes: 1

jdb
jdb

Reputation: 4519

You are encoding a string to ISO-8859-15 with byte[] b = "Üü?öäABC".getBytes("ISO-8859-15"); then you are decoding it with UTF-8 System.out.println(new String(b, "UTF-8"));. You have to decode it the same way with ISO-8859-15.

Upvotes: 1

Related Questions