Reputation: 2067
I have a file with following type of sequence
8596667067212397077404349431816440311306093411908572330624765346447368390322045806914916831283109072368030292593762209252123791942061171616472217102902772202750582672911834098208970365852595911415723265762439878861571164890323784684895745798887472231090706141213174054010 .........
All are 0-9 only chars
Please suggest me best compression method
Upvotes: 0
Views: 500
Reputation: 112284
With base-10 encoding, you can store 19 digits in a 64-bit integer. That gives a compressed size that is 42.1% the size of the ASCII sequence.
Faster for encoding and especially decoding (which requires division for base 10) would be a Huffman code over the digits, assuming equal probability. That would be three bits for six of the digits and four bits for four of the digits. That is 3.4 bits on average per digit, which gives a compressed size of 42.5%.
The theoretical best you could do with base encoding using multiple precision arithmetic (very slow) on equal probability digits is 41.5%.
Upvotes: 1
Reputation: 3031
It's not totally clear from your question whether you require just a compression method or a standard compressed file format.
You could store them as binary-coded-decimal, which takes 4 bits per digit. That's an exact 50% compression ratio relative to ASCII or UTF-8.
I just tried compressing your example with gzip and it shrank to 60% of the original size - with a larger sequence this would be much more efficient.
Upvotes: 0