Reputation: 162
Im trying to compress a sequence of non-negative numbers, where:
Each number appears once only (it means there is total 2^N numbers )
Example for N = 4:
[14, 1, 8, 2, 12, 6, 0, 10, 4, 13, 5, 7, 15, 9, 3, 11]
So normally each number will cost 4 bits and for 16 numbers we will have to use 16x4 = 64 bits to store them.
Currently I have just thought of compressing them as below:
So the compressed data size will be:
Z = 8 * 4 + 4 * 3 + 2 * 2 + 1 * 1 + 1 * 0 = 49 bits
The compression ratio is about 76%, which is pretty good (I think).
But for larger values of N, the ratio seems to be decreased (For N = 2048, the ratio is only 91% )
So I would like to hear your suggestions for better compressing.
Thank you.
Upvotes: 9
Views: 1121
Reputation: 157344
For this specific problem, the most efficient encoding is to view the permutation of [0 .. 2^N-1]
as a numeral in the factorial number system and store the Lehmer code for that permutation.
This gives a requirement of ceil(log2((2^N)!))
bits. For N = 4, this uses 45 bits (70.3%); for N = 11 (2^N = 2048), 19581 bits (86.9%).
The compression ratio worsens as N increases; using the simple approximation log x! >= (x log x) - x + 1
we attain a minimum for log2((2^N)!) / (N 2^N)
of 1 - ((2^N - 1)/(2^N))*(1 / (N * log(2)))
, which approaches 1
as N
tends to infinity.
Given this absolute bound on compression ratio, any approach you can find which is reasonably efficient is worth going for; for values as small as N = 15 it's impossible to do better than 90%.
Upvotes: 2
Reputation: 241701
As is pointed out in comments, the optimal encoding -- if all permutations are equally probable -- is to replace the entire permutation with its index in the enumeration of permutations. Since there are n! possible permutations, the index requires log2n! bits, and therefore the compression ratio from the naive encoding using log2n bits for each element is (log n!)/(n log n).
Using Stirling's approximation, we can rewrite that as (n log n - n + O(log n))/(n log n), which is 1 - 1/(log n) + O(1/n) which evidently asymptotically approaches 1 as n grows. So it is inevitable that the compression ratio will decrease for larger n.
It is not possible to achieve better compression unless not all permutations are equally probable (and you have some information about the probability distribution).
Upvotes: 3
Reputation: 96258
Currently you are using N*2^N bits.
Basically what you have is a permutation of the numbers, and each permutation is unique, and for permutation you can calculate a unique identifier. Since there are (2^N)! permutations, you will only need ceil(log2((2^N)!)) bits. For you example, this is 45 bits.
Upvotes: 2