osanchezmon
osanchezmon

Reputation: 544

How to convert binary string to Java String encoded using UFT-8

In order to send a chunk of bits from a 4 words String, I'm doing getting the byte array from the String and calculating the bit string.

StringBuilder binaryStr = new StringBuilder();

byte[] bytesFromStr = str.getBytes("UTF-8");
for (int i = 0, l = bytesFromStr.length; i < l; i++) {
    binaryStr.append(Integer.toBinaryString(bytesFromStr[i]));
}

String result = binaryStr.toString();

The problem appears when I want to do the reverse operation: converting a bit string to a Java String encoded using UTF-8.

Please, Is there someone that can explain me the best way to do that?

Thanks in advance!

Upvotes: 0

Views: 640

Answers (2)

osanchezmon
osanchezmon

Reputation: 544

Thanks @Andreas for your code. I test using your function and "decoding" again to UTF-8 using this:

StringBuilder revealStr = new StringBuilder();
for (int i = 0; i < result.length(); i += 8) {
    revealStr.append((char) Integer.parseUnsignedInt(result.substring(i, i + 8), 2));
} 

Thanks for all folks to help me.

Upvotes: 0

Andreas
Andreas

Reputation: 159260

TL;DR Don't use toBinaryString(). See solution at the end.


Your problem is that Integer.toBinaryString() doesn't return leading zeroes, e.g.

System.out.println(Integer.toBinaryString(1));   // prints: 1
System.out.println(Integer.toBinaryString(10));  // prints: 1010
System.out.println(Integer.toBinaryString(100)); // prints: 1100100

For your purpose, you want to always get 8 bits for each byte.

You also need to prevent negative values from causing errors, e.g.

System.out.println(Integer.toBinaryString((byte)129)); // prints: 11111111111111111111111110000001

Easiest way to accomplish that is like this:

Integer.toBinaryString((b & 0xFF) | 0x100).substring(1)

First, it coerces the byte b to int, then retains only lower 8 bits, and finally sets the 9th bit, e.g. 129 (decimal) becomes 1 1000 0001 (binary, spaces added for clarity). It then excludes that 9th bit, in effect ensuring that leading zeroes are in place.

It's better to have that as a helper method:

private static String toBinary(byte b) {
    return Integer.toBinaryString((b & 0xFF) | 0x100).substring(1);
}

In which case your code becomes:

StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
    binaryStr.append(toBinary(b));
String result = binaryStr.toString();

E.g. if str = "Hello World", you get:

0100100001100101011011000110110001101111001000000101011101101111011100100110110001100100

You could of course just do it yourself, without resorting to toBinaryString():

StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
    for (int i = 7; i >= 0; i--)
        binaryStr.append((b >> i) & 1);
String result = binaryStr.toString();

That will probably run faster too.

Upvotes: 2

Related Questions