user2466704
user2466704

Reputation: 23

Converting byte[] to String Java. Adding a ? in every conversion

I'm making a java code and I want to read from a file, 64kb at a time and convert those 64kb to string. I'm able to read and keep them in a vector but when I try to convert to String there are allways more characters than there should be. For example:

Converting %Çì¢ with String s = new String(byte[], "UTF8") it gives me %???? Converting %Çì¢ with String s = new String(byte[]) or new String(byte[],"Cp1252"), etc , etc gives me %Çì?¢ which would be perfect if it weren't for the ? . Can anyone help me? Tried every way to convert byte[] to string :(

Upvotes: 0

Views: 124

Answers (3)

fge
fge

Reputation: 121800

Don't use String for binary data. It cannot work.

Strings in Java at runtime are sequences of chars, and not all byte sequences can be converted to chars.

If you need a String representation of binary data, use a dedicated format which can do so (base64 comes to mind).

See here for the full story; there is an example at the end showing why String for binary cannot work.

Here is a sample code which will yell at you (ie, throw an exception) if your byte array cannot be converted to a string:

final CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder()
    .onMalformedInput(CodingErrorAction.REPORT);

try {
    decoder.decode(ByteBuffer.wrap(yourByteArray));
} catch (CharacterCodingException e) {
    System.err.println("No can't do...");
    e.printStackTrace(System.err);
}

By default, unmappable byte sequences are replaced and don't raise an error.

And of course, there is no guarantee that a correct byte sequence will take exactly 64k.

Upvotes: 2

dkatzel
dkatzel

Reputation: 31648

I think the UTF-8 needs a hyphen try this:

String s = new String(myArray, "UTF-8")

EDIT

There might also be a problem with reading UTF-8 every time in the String. The Byte order mark (BOM) will only be in the first bytes of the file, not every 64k. It might be better to use a Reader

BufferedReader in = new BufferedReader(
         new InputStreamReader(new FileInputStream(myFile), "UTF-8"));

AND you might also have a problem printing the String to the console if the console isn't UTF-8 so you have to change the encoding there too.

PrintStream out = new PrintStream(System.out, true, "UTF-8");

String str;

while ((str = in.readLine()) != null) {
    out.println(str);
}

Upvotes: 0

user2591612
user2591612

Reputation:

I don't know what those characters are supposed to be, but this is the proper way to use String with data that is UTF-8, if that's the actual encoding:

   byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};
   try
   {
      String value = new String(byteArray, "UTF-8");
      System.out.println(value);
   }
   catch (UnsupportedEncodingException ex)
   {
      // do something
   }

Output:

WOW...

Upvotes: 0

Related Questions