Reputation: 727
I am trying to compute the compression ratio in this way
1- compute the original size in byte:
int size= sizeof(inputarr)/sizeof(inputarr[0]);
originalsize = size *(sizeof(int))
2-Compute the encoded data size for Huffman in this way as I read on google (the values just for example):
value:1 freq : 3
value:2 freq : 4
value:3 freq : 3
(1 * 3) + (2 * 4) + (3 * 4)
int encodedsize = 3+ 8 + 12 = () bits/8 = ~() bytes.
3- the compression ratio should be:
CR = original size /encoded size
This is my input data:
86, 5, 81, 6, 85, 14, 89, 17, 84, 6, 88, 4, 83, 4, 87, 4, 91, 2, 95, 21, 99, 8, 103, 11, 98, 4, 93, 2, 97, 11, 92, 4, 83, 4, 87, 4, 91, 20, 86, 5, 90, 4, 85, 6, 89, 14, 84, 4, 88, 4, 92, 7, 87, 14, 82, 6, 86, 2, 90, 84, 94, 6, 89, 11, 93, 6, 88, 8, 92, 20, 96, 5, 91, 16, 86, 3, 90, 28, 94, 8, 89, 10, 93, 7, 88, 12, 83, 2, 87, 8, 82, 2, 86, 6, 90, 12, 94, 29, 98, 10, 93, 2, 88, 7, 92, 49, 96, 16, 91, 8, 95, 9, 90, 2, 94, 8, 98, 8, 93, 6, 97, 31, 92, 5, 96, 48, 91, 7, 95, 15, 99, 6, 94, 8, 98, 26, 93, 8, 97, 21, 92, 9, 96, 37, 100, 24, 95, 4, 99, 35, 94, 4, 89, 6, 93, 2, 97, 23, 92, 17, 96, 20, 100, 19,
This is the values and the frequency that I got from Huffman: Values:
2 3 4 5 6 7 8 9 10 11 12 14 15 16 17 19 20 21 23 24 26 28 29 31 35 37 48 49 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 103
Frequency:
8 1 12 4 10 4 9 2 2 3 2 3 1 2 2 1 3 2 1 1 1 1 1 1 1 1 1 1 1 2 3 3 2 5 4 5 5 5 5 7 7 6 4 5 4 4 3 2 1
The summation of
(value * frequency = 8499 bit ---> 8499/8 = 1062.375 byte
The problem that I have in this way is that: the size of encoded data is higher than the size of input data(originalsize) that should be
(number of values * sizeof(int)) = 164*4 = 656 byte
In this way the CR = 656 / 1062.375 !! Where is the error that I have please?
UPDATE: This is the compressed data, when I use the decompression function, I can get on the original values, but how can I represent it in a file to get smaller size than the original file(that has the original values)? Any Idea please?
0111100000010010110101000001011111001010110111100011010100111101100001110100001110110001001111111101000010111001100101000101100000111101111000011001001011001110111011000011101000011101100011100000111100000010111101100000101010010101111110001110110011110111101000100000110111100101010100111100110101111000111001101010010101100111001010100110110111011100000111000000100010010110111111111000010111111010111001011010010010001111000001010011010101100001001100001011000101000110111110100101101010111001111100100001101000111100001110011000101110101001000111000101110001011011111111110110101100111100101100001101101110010100010011111010111010000001110010011010001000101111111111100110111010101100101100001111110011111000110001000100001110111110110111011111011111100011110100111111110110111001001111100111011001010101110000110010010110101110110110110111011000011110000101001
Upvotes: 1
Views: 616
Reputation: 112502
Multiplying the frequency of a symbol by the symbol itself makes no sense.
There appears to be no application of Huffman's algorithm in your question. You are determining how many times each symbol occurs, which is the input to Huffman's algorithm. The output would provide the number of bits for each symbol. You would then multiply the number of bits for each symbol by the frequency, i.e. number of occurrences, of each symbol, and sum that. That gives you the total number of bits in the encoded data.
Except that you need to then add the number of bits required to represent the Huffman code that was generated. The decoder would need that to make any sense of the encoded symbols.
This particular data would give about 109 bytes of encoded symbols, which is less than the 164 bytes of original data, assuming it is stored as eight bits each. Though that data could be stored in seven bits for each symbol, in which case it would be 144 bytes uncompressed.
However a simple tree representation of the Huffman code is about 63 bytes, bringing the total to 172 bytes. That is larger than the original data. A more efficient representation used by deflate can do it with 33 bytes, bringing the total to 142 bytes. A little less than the original 164 bytes, and a smidge less than the 144 bytes needed if seven bits per symbol were used in the uncompressed representation.
A representation of the Huffman code is required for decompression, and an efficient one is needed to compute a realistic compression ratio. When computing a compression ratio, the representation of the original uncompressed data is also important.
To answer the update: you put eight bits in each byte. Then those 865 bits will take 109 bytes. However those bits are not sufficient to reconstruct the original data. They do not contain a description of the Huffman code for the decoder.
Upvotes: 1