Reputation: 11
I want to compress an array consisting of about 10^5 random integers in range 0 to 2^15. The integers are unsorted and I need to compress them lossless.
I don't care much about the amount of computation and time needed to run the algorithm, just want to have better compression ratio.
Are there any suggested algorithms for this?
Upvotes: 0
Views: 639
Reputation: 1006
For your exact example: just encode each number as a 15-bit unsigned int and apply bit packing. This is optimal since you have stated each integer in uniformly random in [0, 2^15), and the Shannon Entropy of this distribution is 15 bits.
For a more general solution, apply pcodec (https://github.com/mwlon/pcodec). It takes advantage of any smooth-ish data and compresses near optimally on shuffled data.
These approaches are both computationally cheap, but more compute won't get you further in this case.
Upvotes: 0
Reputation: 16068
Assuming you don´t need to preserve original order, instead of passing the numbers themselves, pass the count. If they have a normal distribution, you can expect each number to be repeated 3 or 4 times. With 3 bits per number, we can count up to 7. You can make an array of 2^15 * 3 bits and every 3 bits set the count of that number. To handle extreme cases that have more than 7, we can also send a list of numbers and their counts for these cases. Then you can read the 3 bits array and overwrite with the additional info for count higher than 7.
Upvotes: 1