Reputation: 101
I am trying to implement a compression/decompression algorithm in Java, and this is my current progress: full source code.
Just have two things I am not entirely sure on how to fix:
Upvotes: 3
Views: 262
Reputation: 5095
The tree can be stored in a quite compact way.
I did it once - basically I traversed the tree in a LR pattern. So at node N, if the left branch is a leaf then write a 1
followed by the symbol this leaf encodes, otherwise just write a 0
and descend into that node.
Then do the same for the right branch - if it is a leaf write 1
then symbol, if it is a node write 0
and descend into that node.
This tree http://huffman.ooz.ie/?text=asd (thanks to Bernhard!) could be written as 1[D]01[A]1[S]
which is 28bits or 4 bytes total (eg. replace [D] with 01000100
).
Decoding the table is done in the reverse manner. I don't have time to write the code, but it does work, and I believe it must be close to the most compact encoding for the tree (not that I have any evidence of that).
Upvotes: 1
Reputation: 14003
Problem 1 can not be solved. For example, if you have a file with only 3 different bytes and you do a huffman compression you create a tree like this: http://huffman.ooz.ie/?text=asd
So, the tree from the compressed file is automatically bigger than your complete old file, but if you would do any other thing, it would not be a huffman compression anymore. Same for all files with very many different bytes. Principally: More equal bytes in a file will bring a better result after compression
Second thing is very generally, but for some help look at this code: https://github.com/nayuki/Huffman-Coding/blob/master/src/nayuki/huffmancoding/HuffmanDecompress.java
Upvotes: 2