Reputation: 427
how can I compress a row of integers into something shorter?
E.g. Input: '1 2 4 9 8 5 2 7 6 2 3 4' -> Algorithm -> Output: 'X Y Z'
and can get it back the other way around? ('X Y Z' -> '1 2 4 9 8 5 2 7 6 2 3 4')
Input contains 12 digits max, numbers only. Output can be alphanumeric and should be 3-4 digits max.
Thanks in advance.
Edit: Each input digit 0-9; Output 0-9a-Z
Upvotes: 3
Views: 3391
Reputation: 5559
Similar discussion Compressing a set of large integers
PHP Compress array of bits into shortest string possible
$val = pack('H*', "124985276234");
echo '#'. $val . '#';
print_r(unpack('H*', $val));
die;
#Issue
00011001 => 25
11001 => 25
I was try to implement @Misch algorithm in PHP but some bits when use decbin was wrong and give me bad results when unpacking. Then found pack function and its work similarly. But numbers from 0 to 9 are wrong when unpacking and on 9000000 test 8090899 was decompress with wrong value no collision was found.
set_time_limit(0);
ini_set('memory_limit', '5000M');
ini_set("max_execution_time",0);
$collision = [];
$err = [];
for ($i=0; $i < 9000000; $i++) {
$packed = pack('H*', $i);
$unpacked = unpack('H*', $packed)[1];
if ( array_key_exists($i, $collision) ) {
die("Collision:". $i .' !!!!'. $packed .'!!!!'. $unpacked);
}
if ( $i != $unpacked ) {
$e = "Collision2:". $i .' !!!!'. $packed .'!!!!'. $unpacked . "\n";
#echo $e;
$err[] = $e;
}
$collision[] = $packed;
#echo '#'. $i .'#' . $unpacked . '#' . $unpacked . "#\n";
}
Upvotes: 0
Reputation: 10840
First, you could just use any existing compression algorithm, via some library. However knowing that your input is very specialized, you can also write a special algorithm adapted yo your case.
But let's first analyze how much you can compress this input. To simplify, I'll first consider compressing exactly 12 digits from 0 to 9 (you didn't explicitly write what the input range is however). There are 10^12 possible combinations, which is a little less than 2^40. So what you basically want to do is compress 40 bits.
Let's now analyze how you can compress these 40 bits. If you understand alphanumeric as [0-9A-Z]
, you have 36 characters available. Each character can encode log_2(36)=5.1 bits. Encoding your 40 bits therefore needs 8 alphanumeric characters.
A better alternative would be to use base64. Here, you have 64 characters, which means each character can encode 6 bits, so you could encode your input with only 40/6=6.666 => 7 characters.
If you consider compressing your input to binary, you will obviously need 40 bits. This can be written with 5 8-bit ASCII characters, with 2 32-bit integers or with 1 64-bit integer. However this probably isn't what you want to achieve.
Conclusion: you can't compress data arbitrarily much and the data that you want to compress can't be compressed as much as you like to compress it.
As an example, to encode 12 digits from 0 to 9 into ASCII characters, you could simply print convert them into one big number, convert it to binary, then take this binary number by portions of 8 bit and convert them to ASCII characters.
Input: 1 2 4 9 8 5 2 7 6 2 3 4
One number: 124985276234
Binary: 1110100011001101100111111011101001010
Grouped: 11101 00011001 10110011 11110111 01001010
ASCII: <GS><EM>��J
Note that some ASCII-symbols are not printable. If that is important to you, you'll have to use an alternative encoding, as for example base 64, which only has 64 different characters, but they are all printable.
Upvotes: 5
Reputation: 178431
Unless your input comes from a specific domain, where many inputs are unlikely/unacceptable - you cannot do it.
You can encode 62^4~=1.4*10^7
different serieses with 4 alphanumeric characters.
On the other hand, input of 12 digits can have 10^12 possible different inputs.
From pigeonhole principle - there must be 2 "compressions" that are mapped to the same input.
But, since you should need to recreate the original sequence, you cannot differentiate two identical compressions.
So such a compression does not exist.
In fact, to compress a 12 digits number into 4 characters, you are going to need the alphabet for the characters to be of size 1000:
x^4 = 10^12, x>0
x = 1000
Upvotes: 8