Nicholas
Nicholas

Reputation: 27

How to transfer a String data which starts with "0" to a byte type , and recover it without losing the "0" at the first place?

I was learning Huffman tree and code in Java recently. There is a serious problem when I am trying to encode the String binary. I will explain step by step.

1, These are the original words that I'm going to encode:

String words = "aa bbb cccc";

2, The map of the Huffman code, they are not the real code. I am trying to make some codes so that I can show you the problem as soon as possible;

a:1110 0110 (ignore the space, I deliberately added it for human readability)
b:0001 1001
c:011

3, I transfer the binary string into a byte array.

byte[] arr = {-20,31,3}

4, Decoding. I calculate each byte with 256(1,1111,1111) by bitwise OR in order to get the 0s back. Then I use the substring(..).

for(int i = 0; i < bytes.length; i++) {
    boolean isLast = i == bytes.length - 1;
    String byteToString = byteToString(isLast, bytes[i]);
    System.out.println(byteToString + " ");
}
public static String byteToString(boolean isLast,byte b) {
    int temp = b;
    if(!isLast) {
        temp = 256 | temp;  //temp = 256 | -20;(256 | 31)
    }
        
    String stringDecode = Integer.toBinaryString(temp);
    if(!isLast){
        return stringDecode.substring(stringDecode.length() - 8);
    } else {
        return stringDecode;
    }
}

result:

11101100
00011111
11   // the 0 is missing.

I was tring to do "temp = 256 | 3",but I got "00000011".It has too many 0s. I don't know whether I make myself clear. If someone can do me a favour, I will very appreciate it. Thank you.

================= edit

1, String to encode

` String sentence = "aaaa bb c dddd hhhhhhh jjk"; //contentBytes.length() : 29 byte[] contentBytes = sentence.getBytes();

`

2, ANSCII and the times of appearances of the characters

32:5 97:4 98:2 99:1 100:4 104:7 106:2 107:1

3,the Huffman Tree enter image description here

4,then the Huffman code

32:00

97:111

98:010

99:01101

100:110

104:10

106:0111

107:01100

5, encoding the sentence concat all of the Huffman codes to a String data.

111111111111000100100001101001101101101100010101010101010000111011101100

6, splitting them by 8 bits or chars

11111111, 11110001, 00100001, 10100110, 11011011, 00010101, 01010101, 00001110, 11101100

7, transfering each String to a byte type //length: 9 < 29 (length of contentBytes) encode bytes: [-1, -15, 33, -90, -37, 21, 85, 14, -20] 8, decoding

 int codeInt = 256 | -1;   //and so on
 String stringDecode = Integer.toBinaryString(temp);
 stringDecode.substring(stringDecode.length() - 8);

9, if the last 8 chars in step 6 started with "0", I have the problem as I mentioned before.

I wish I explained the problem clearly. I hope so. Please visit my code on github if anyone were insterested. Huffuman Code

Thank you, guys!

Upvotes: 0

Views: 84

Answers (1)

Mark Adler
Mark Adler

Reputation: 112394

You are not even thinking about the Huffman codes correctly, so your Java code and your question are irrelevant.

Each code has a number of bits, and then those bits. This is implied where you show c as having three bits. You need to put those bits into your output stream as that number of bits, in order to realize the compression. Your encoding of one code each into a byte makes no sense, especially if you happen to end up with codes longer than eight bits.

This is done properly with the bit operations, shift and or. You build up a stream of bits into a word, and when you have at least eight bits in your word buffer, you write out that one byte. At the end, you write out any remaining bits in the last byte.

Then at the other end, you read in bytes, building up a stream of bits in a word with shift and or, and then pull out bits from the stream to decode your Huffman codes as needed, using shift and the and operation.

Upvotes: 3

Related Questions