Reputation: 1147
For any two sequences a, b, where a = [a1,a2,...,an] and b = [b1,b2,...,bn] (0<=ai, bi<=m), I want to find an integer function f that f(a) = f(b) if and only if a, b
have the same elements, without concerning about their orders.For example, if a = [1,1,2,3], b = [2,1,3,1], c = [3,2,1,3], then f(a) = f(b), f(a) ≠ f(b).
I know there is a naive algorithm which first sort the sequence and then map it to an integer. For example, after sorting, we have a = [1,1,2,3], b = [1,1,2,3], c = [1,2,3,3], and suppose that m = 9, using decimal conversion, we will finally have f(a) = f(b) = 1123 ≠ f(c) = 1233. But this will take O(nlog(n)) time using some kind of sorting algorithm (Don't use non comparison sorting algorithms).
Is there a better approach ? Something like hash? An O(n) algorithm?
Note that I also need the function easy to be inversed, which means that we can map an integer back to a sequence (or a set, more concisely).
Update: Forgive my poor description. Here both m, n can be very large (1 million or larger). And I also want the upper bound of f to be quite small, preferably O(m^n).
Upvotes: 3
Views: 1835
Reputation: 8689
Wow, @wildplasser answer is actually super smart. To expand a bit:
Any number can be decomposed in a unique fashion in prime numbers (this is known as the fundamental theorem of arithmetic). His answer relies on that, by building a number for which the input array is a representation of prime factor decomposition. As multiplication is commutative, the exact order of the elements in the array is of no importance, but a given number is associated to one (and only one) sequence of elements.
His solution can be expanded for arbitrary size, e.g. in Python:
import operator
import itertools
import math
class primes(object):
def __init__(self):
self.primes = [2,3,5,7,11]
self.stream = itertools.count(13, 2)
def __getitem__(self, i):
sq = int(math.sqrt(i))
while i >= len(self.primes):
n = self.stream.next()
while any(n % p == 0 for p in self.primes if p <= sq):
n = self.stream.next()
self.primes.append(n)
return self.primes[i]
def prod(itr):
return reduce(operator.mul, itr, 1)
p = primes()
def hash(array):
return prod(p[i] for i in array)
With the expected results:
>>> hash([1,2,2,3,5])
6825
>>> hash([5,3,2,2,1])
6825
Here, 6825 = 3^1 x 5^2 x 7^1 x 13^1
, as 3
is the '1' prime (0-indexed), 5
the '2', etc...
>>> 3**1 * 5**2 * 7**1 * 13**1
6825
Building the number itself is O(n) multiplications, as long as the final result remains in the domain of int
that you are using (unfortunately, I suspect it might get out of hand quite quickly). Building the series of prime number with an Eratosthenes Sieve as I did is asymptotically O(N * log log N), where N is the m-th largest prime. As asymptotically, N ~ m log m, this gives an overall complexity of O(n + m * log m * loglog (m * log m))
Using a similar approach, instead of taking the prime number decomposition, we can also consider the array to be a representation of the decomposition of a number in a base. To be consistent, this base has to be larger than the larger number of similar elements (e.g. for [5, 3, 3, 2, 1]
, base must be > 2 because there are two 3
). Being on the safe side, you can write:
def hash2(array):
n = len(array)
return sum(n**i for i in array)
>>> hash2([1,5,3,2,2])
8070
>>> hash2([2,1,5,2,3])
8070
You can improve this by computing the largest number of similar elements in the array first, but the hash2
function will be a real hash only when used with the same basis, so the prime number decomposition is probably safe if you work with arrays of varying length and composition, as it will ALWAYS return the same unique integer per bag of numbers.
Upvotes: 3
Reputation: 44250
This works for sufficiently small values of m
, and sufficiently small array sizes:
#include <stdio.h>
unsigned primes [] = { 2,3,5,7,11,13,17, 19, 23, 29};
unsigned value(unsigned array[], unsigned count);
int main(void)
{
unsigned one[] = { 1,2,2,3,5};
unsigned two[] = { 2,3,1,5,2};
unsigned val1, val2;
val1 = value(one, 5);
val2 = value(two, 5);
fprintf(stdout, "Val1=%u, Val2=%u\n", val1, val2 );
return 0;
}
unsigned value(unsigned array[], unsigned count)
{
unsigned val, idx;
val = 1;
for (idx = 0; idx < count; idx++) {
val *= primes [ array[idx]];
}
return val;
}
For an explanation, see my description here.
Upvotes: 4