Daniel
Daniel

Reputation: 24805

Generating ids for a set of integers

Background:

I'm working with permutations of the sequence of integers {0, 1, 2 ... , n}. I have a local search algorithm that transforms a permutation in some systematic way into another permutation. The point of the algorithm is to produce a permutation that minimises a cost function. I'd like to work with a wide range of problems, from n=5 to n=400.

The problem:

To reduce search effort I need to be able to check if I've processed a particular permutation of integers before. I'm using a hash table for this and I need to be able to generate an id for each permutation which I can use as a key into the table. However, I can't think of any nice hash function that maps a set of integers into a key such that collisions do not occur too frequently.

Stuff I've tried:

I started out by generating a sequence of n prime numbers and multiplying the ith number in my permutation with the ith prime then summing the results. The resulting key however produces collisions even for n=5.

I also thought to concatenate the values of all numbers together and take the integer value of the resulting string as a key but the id quickly becomes too big even for small values of n. Ideally, I'd like to be able to store each key as an integer.

Does stackoverflow have any suggestions for me?

Upvotes: 9

Views: 2723

Answers (10)

heghogbbb
heghogbbb

Reputation: 249

get two permutations of same series of numbers {1,.., n}, construct a mapping tupple, (id, permutation1[id], permutation2[id]), or (id, f1(id), f2(id)); you will get an unique map by {f3(id)| for tuple (id, f1(id), f2(id)) , from id, we get a f2(id), and find a id' from tuple (id',f1(id'),f2(id')) where f1(id') == f2(id)}

Upvotes: 0

Dolphin
Dolphin

Reputation: 4772

Similar to Bojan's post it seems like the best way to go is to have a deterministic order to the permutations. If you process them in that order then there is no need to do a lookup to see if you have already done any particular permutation.

Upvotes: 0

John Fouhy
John Fouhy

Reputation: 42223

Prime powers would work: if p_i is the ith prime and a_i is the ith element of your tuple, then

p_0**a_0 * p_1**a_1 * ... * p_n**a_n

should be unique by the Fundamental Theorem of Arithmetic. Those numbers will get pretty big, though :-)

(e.g. for n=5, (1,2,3,4,5) will map to 870,037,764,750 which is already more than 32 bits)

Upvotes: 0

Kamarey
Kamarey

Reputation: 11079

Not relates directly to the question, but as an alternative solution you may use Trie tree as a look up structure. Trie trees are very good for strings operations, its implementation relatively easy and it should be more faster (max of n(k) where k is length of a key) than hashset for a big amount of long strings. And you aren't limited in key size( such in a regular hashset in must int, not bigger). Key in your case will be a string of all numbers separated by some char.

Upvotes: 0

Lasse V. Karlsen
Lasse V. Karlsen

Reputation: 391664

Judging by your question, and the comments you've left, I'd say your problem is not possible to solve.

Let me explain.

You say that you need a unique hash from your combination, so let's make that rule #1:

  • 1: Need a unique number to represent a combination of an arbitrary number of digits/numbers

Ok, then in a comment you've said that since you're using quite a few numbers, storing them as a string or whatnot as a key to the hashtable is not feasible, due to memory constraints. So let's rewrite that into another rule:

  • 2: Cannot use the actual data that were used to produce the hash as they are no longer in memory

Basically, you're trying to take a large number, and store that into a much smaller number range, and still have uniqueness.

Sorry, but you can't do that.

Typical hashing algorithms produce relatively unique hash values, so unless you're willing to accept collisions, in the sense that a new combination might be flagged as "already seen" even though it hasn't, then you're out of luck.

If you were to try a bit-field, where each combination has a bit, which is 0 if it hasn't been seen, you still need large amounts of memory.

For the permutation in n=20 that you left in a comment, you have 20! (2,432,902,008,176,640,000) combinations, which if you tried to simply store each combination as a 1-bit in a bit-field, would require 276,589TB of storage.

You're going to have to limit your scope of what you're trying to do.

Upvotes: 6

Noon Silk
Noon Silk

Reputation: 55172

How fast does it need to be?

You could always gather the integers as a string, then take the hash of that, and then just grab the first 4 bytes.

For a hash you could use any function really, like MD5 or SHA-256.

Upvotes: 3

Bojan Resnik
Bojan Resnik

Reputation: 7388

As others have suggested, you can use hashing to generate an integer that will be unique with high probability. However, if you need the integer to always be unique, you should rank the permutations, i.e. assign an order to them. For example, a common order of permutations for set {1,2,3} is the lexicographical order:

  1. 1,2,3
  2. 1,3,2
  3. 2,1,3
  4. 2,3,1
  5. 3,1,2
  6. 3,2,1

In this case, the id of a permutation is its index in the lexicographical order. There are other methods of ranking permutations, of course.

Making ids a range of continuous integers makes it possible to implement the storage of processed permutations as a bit field or a boolean array.

Upvotes: 3

Zed
Zed

Reputation: 57678

Zobrist hashing might work for you. You need to create an NxN matrix of random integers, each cell representing that element i is in the jth position in the current permutation. For a given permutation you pick the N cell values, and xor them one by one to get the permutation's key (note that key uniqueness is not guaranteed).

The point in this algorithm is, that if you swap to elements in your permutations, you can easily generate the new key from the current permutation by simply xor-ing out the old and xor-ing in the new positions.

Upvotes: 8

grenade
grenade

Reputation: 32169

You could MD5 hash a comma separated string containg your ints.

In C# it would look something like this (Disclaimer: I have no compiler on the machine I'm using today):

using System;
using System.Security.Cryptography;
using System.Text;

public class SomeClass {
    static Guid GetHash(int[] numbers) {
        string csv = string.Join(',', numbers);
        return new Guid(new MD5CryptoServiceProvider().ComputeHash(Encoding.ASCII.GetBytes(csv.Trim())));
    }
}

Edit: What was I thinking? As stated by others, you don't need a hash. The CSV should be sufficient as a string Id (unless your numbers array is big).

Upvotes: 2

Victor Sorokin
Victor Sorokin

Reputation: 12006

Convert each number to String, concatenate Strings (via StringBuffer) and take contents of StringBuffer as a key.

Upvotes: 0

Related Questions