Herman Lee Tzer Her
Herman Lee Tzer Her

Reputation: 101

Needing some Huffman assistance

I am trying to implement a compression/decompression algorithm in Java, and this is my current progress: full source code.

Just have two things I am not entirely sure on how to fix:

  1. When I compress files its working for everything except files with small number of bytes. A friend told me the problem is that when we compress files, we will need the tree, and have to store it in the file itself, thus making the compressed file bigger due to the tree. Any pointers as to how to fix that?
  2. Not sure how to implement decompression at all.

Upvotes: 3

Views: 262

Answers (2)

Wayne Uroda
Wayne Uroda

Reputation: 5095

The tree can be stored in a quite compact way.

I did it once - basically I traversed the tree in a LR pattern. So at node N, if the left branch is a leaf then write a 1 followed by the symbol this leaf encodes, otherwise just write a 0 and descend into that node.

Then do the same for the right branch - if it is a leaf write 1 then symbol, if it is a node write 0 and descend into that node.

This tree http://huffman.ooz.ie/?text=asd (thanks to Bernhard!) could be written as 1[D]01[A]1[S] which is 28bits or 4 bytes total (eg. replace [D] with 01000100).

Decoding the table is done in the reverse manner. I don't have time to write the code, but it does work, and I believe it must be close to the most compact encoding for the tree (not that I have any evidence of that).

Upvotes: 1

bpoiss
bpoiss

Reputation: 14003

Problem 1 can not be solved. For example, if you have a file with only 3 different bytes and you do a huffman compression you create a tree like this: http://huffman.ooz.ie/?text=asd

So, the tree from the compressed file is automatically bigger than your complete old file, but if you would do any other thing, it would not be a huffman compression anymore. Same for all files with very many different bytes. Principally: More equal bytes in a file will bring a better result after compression

Second thing is very generally, but for some help look at this code: https://github.com/nayuki/Huffman-Coding/blob/master/src/nayuki/huffmancoding/HuffmanDecompress.java

Upvotes: 2

Related Questions