Vijay
Vijay

Reputation: 2067

What is best method to compress sequence of just numbers

I have a file with following type of sequence

8596667067212397077404349431816440311306093411908572330624765346447368390322045806914916831283109072368030292593762209252123791942061171616472217102902772202750582672911834098208970365852595911415723265762439878861571164890323784684895745798887472231090706141213174054010 .........

All are 0-9 only chars

Please suggest me best compression method

Upvotes: 0

Views: 500

Answers (2)

Mark Adler
Mark Adler

Reputation: 112284

With base-10 encoding, you can store 19 digits in a 64-bit integer. That gives a compressed size that is 42.1% the size of the ASCII sequence.

Faster for encoding and especially decoding (which requires division for base 10) would be a Huffman code over the digits, assuming equal probability. That would be three bits for six of the digits and four bits for four of the digits. That is 3.4 bits on average per digit, which gives a compressed size of 42.5%.

The theoretical best you could do with base encoding using multiple precision arithmetic (very slow) on equal probability digits is 41.5%.

Upvotes: 1

Rodney
Rodney

Reputation: 3031

It's not totally clear from your question whether you require just a compression method or a standard compressed file format.

You could store them as binary-coded-decimal, which takes 4 bits per digit. That's an exact 50% compression ratio relative to ASCII or UTF-8.

I just tried compressing your example with gzip and it shrank to 60% of the original size - with a larger sequence this would be much more efficient.

Upvotes: 0

Related Questions