Writing the huffman tree to file after compression

Question

I'm trying to write a Huffman tree to the compressed file after all the actual compressed file data has been inserted. But , i just realized a bit of a problem , suppose I decide that once all my actual data has been written to file , I will put in 2 linefeed characters and then write the tree. That means , when I read stuff back, those two linefeeds (or any character really) are my delimiters. The problem is , that its entirely possible that the actual data also has 2 linefeeds one after the other, in such a scenario, my delimiter check would fail. I've taken the example of two linefeeds here , but the same is true for any character string, I could subvert the problem by maybe taking a longer string as the delimiter , but that would have two undersirable effects: 1. There is still a remote chance that the long string is by some coincidence present in the compressed data. 2. Un-necessarily bloating a file which needs to be compressed.

Does anyone have any suggestions on how to separate the compressed data from the tree data ?

user1071136 · Accepted Answer

First, write the size of the tree in bytes. Then, write the tree itself, and then the contents itself.

When reading, first read the size, then the tree (now you know how many characters to read), and then the contents.

The size can be written as a string, ending with a line feed - this way, you know that the first number and line feeds belong to the size of the tree.

Writing the huffman tree to file after compression

Answers (2)

Related Questions