Reputation: 1169
I want to convert a sequence of numbers to a single number which will retain individual values as well as their position. e.g. the following sequence is provided-
1,6,7,8,9,45,67
here,take for example,if I apply simple addition i.e. 1+6+7+8+9+45+67 then a number will be generated. But from that no. we can't extract the individual numbers with their ordering[i.e. 1,6,7,8,9,...].
Is there any way to achieve this feature without any ambiguous deduction [i.e only 1 unique set of numbers will be extracted from a number.]? Is there any mathematical function which will be helpful to get back the individual elements from that number?
Upvotes: 10
Views: 5851
Reputation: 1
Another answer which occurred to me. Encode each number to balanced ternary, with two bits per trit (e.g., 0=00; +1=01; -1=10). The remaining bit pair (e.g., 11) is the end of element marker, repeated for end of sequence. Con: less space efficient than the prefix code when large values are expected; Pros: 1) more space efficient with mostly small values; 2) encoding/decoding simpler; 3) directly represents negative values.
Upvotes: 0
Reputation: 1
This uses a universal code, such as Elias omega coding (or any prefix code -- but universal codes are prefix codes with some desirable properties). A prefix code encodes a bit sequence (i.e., a number) as a prefix that basically provides the necessary info to determine how many bits constitute the rest of the number.
1) Use the code to represent the number of elements in the sequence. 2) Then use the code to represent each element.
Upvotes: 0
Reputation: 11
Here is a basic PHP implementation of the Godel numbering described by Steve Jessop above:
<?php
$sec = array(5,9,8,4);
$n = count($sec);
$max = max($sec);
$enc = encode($sec);
$dec = decode($enc, $n, $max);
echo "Input sequence: " . implode(",", $sec) . "\n";
echo "Output sequence: " . implode(",", $dec) . "\n";
echo "Godel number: " . $enc;
echo (PHP_INT_MAX/$enc < 20 ? " - too big to decode.\n" : "\n");
function encode($sec) {
$primes = array(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53);
$enc = 1;
$i = 0;
foreach ($sec as $v) {
$enc = $enc * pow($primes[$i], $v+1);
$i++;
}
return $enc;
}
function decode($enc, $n, $max) {
$primes = array(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53);
$sec = array();
for ($i = 0; $i < $n; $i++) {
for ($v = 2; $v <= $max+1; $v++) {
if ($enc/pow($primes[$i], $v) != round($enc/pow($primes[$i], $v))) {
break;
}
}
$sec[] = $v-2;
}
return $sec;
}
?>
This script works only for small numbers in small sequences. It can not handle very large numbers, I guess same as any other implementation. It is even easier and more efficient to store digits concatenating them as they are: [01,05,15,17] -> 1051517.
Upvotes: 1
Reputation: 150108
You can convert this to a base-N number, where N is one larger than the largest value that will appear in your input sequence.
UPDATE
Based on the various comments, I would like to offer an alternative solution that may be easier to implement. You can consider the sequence to be a UTF-8 encoded string and use Huffman coding with a custom dictionary to achieve a compact representation.
The custom dictionary allows you to store very common characters with very few bits (for example, the sequence separator ',' and the individual characters '0'..'9' can be stored with as few as 3 bits, but also other numbers that you find to be statistically likely to occur can be stored in a short bit sequence. For example, if you find that "42" occurs frequently, you could store "42" in just a few bits.
If you only assign special codes to ',' and '0' to '9', you will average less than 4 bits per character in the input string while retaining the comma separating sequence members. Finding common, multi-character substrings and adding them to the dictionary will only improve on that ratio.
Using a custom dictionary also means that you do not have to store the dictionary in the header of the compressed data, since it is well-known to you.
I did something just like this using SharpZipLib
http://www.icsharpcode.net/opensource/sharpziplib/
http://community.sharpdevelop.net/forums/p/8255/23219.aspx
It is also easy to do with zlib
Compressing small piece of data
Upvotes: 10
Reputation: 279245
It's mathematically possible to do it for finite sequences, but not very practical because the numbers required get very big very quickly: there are 677 (about 242) different length-7 sequences of integers from 1 ... 67, let alone longer sequences and larger integers.
For a simple example of such a function, map the sequence [1,6,7,8,9,45,67] to the value 21 * 36 * 57 * 78 * 119 * 1345 * 1767. The bases are the prime numbers, the powers are the elements in the sequence.
The reverse mapping is computed by division -- the number of times you can divide your value by 2
is the first element in the sequence, etc. The largest prime factor of the value tells you how long the sequence is.
If you want to allow 0
in the sequence as well as positive numbers, then add 1 to all the elements when you raise the primes to the powers. Or alternatively use the power of 2
to give the length of the sequence, then start encoding the elements starting with 3
.
Goedel used encodings like this in his proof of his Incompleteness Theorems.
As Kendall Frey says, it is not possible to define a function that maps each infinite sequence of integers to a different integer. This is a consequence of Cantor's proof that the power set of the natural numbers is uncountable: you can't even injectively map all infinite sequences of elements from {true, false}
to the integers, let alone all infinite sequences of elements from the integers.
For more practical approaches, think in terms of encoding your sequence of integers as a sequence of bytes, rather than as a number. A finite sequence of bytes can easily be considered to be a binary value, hence it's a number, you just don't really use it as such. A common representation of your example sequence is the sequence of bytes: [1,6,7,8,9,45,67]
, used for example in JSON. This is a 136-bit number. The mathematical function to reverse this mapping involves arithmetic modulo powers of 256, subtraction of the number 48, multiplication by 10, and suchlike :-)
Upvotes: 5
Reputation: 44326
You can't, if the range of your numbers is infinite.
The power set of natural numbers in uncountable. This means that you can't provide a mapping between sets of numbers and numbers.
What you could do if your numbers are limited to, say 32 bits, is concatenate the numbers into a long binary number, and store them as a sequence of bytes, maybe as a BigNum.
Upvotes: 1
Reputation: 12867
Let's say your sequence is called s
and I'll define len(n)
to be the number of digits in n.
Then the first digit of your result is len(s[0])
,
and the following len(s[0])
digits are the number s[0]
;
then you append len(s[1])
and s[1]
, and so on.
This works for numbers with at most 9 digits.
Upvotes: 2
Reputation: 24865
Updated to check the 0,1 case.
Separate the different numbers by 001.
To avoid confusion with 00 inside your numbers, each time a 0 appears in your numbers, replace it with 01.
To decode, split by 001. Replace all 01 by 0.
Upvotes: 0