Gaminic
Gaminic

Reputation: 581

Data Structure / Hash Function to link Sets of Ints to Value

Given n integer id's, I wish to link all possible sets of up to k id's to a constant value. What I'm looking for is a way to translate sets (e.g. {1, 5}, {1, 3, 5} and {1, 2, 3, 4, 5, 6, 7}) to unique values.

Guarantees:

Additionally, it would be nice (but not obligatory) if "neighboring" sets (drop one id, add one id, swap one id, etc) are easy to reach, as well as "all sets that include at least this set".

Any ideas?

Upvotes: 0

Views: 328

Answers (3)

wildplasser
wildplasser

Reputation: 44250

Enumerate using the product of primes.

  • a -> 2
  • b -> 3
  • c -> 5
  • d -> 7
  • et cetera

Now hash(ab) := 6, and hash (abc) := 30

And a nice side effect is that, if "ab" is a subset of "abc", then:

hash(abc) % hash(ab) == 0

and

hash(abc) / hash(ab) == hash(c)

The bad news: You might run into overflow, the 100th prime will probably be around 1000, and 64 bits cannot accomodate 1000**10. This will not affect the functioning as a hash function; only the subset thingy will fail to work. the same method applied to anagrams

The other option is Zobrist-hashing. It is equivalent to the the primes method, but instead of primes you use a fixed set of (random) numbers, and instead of multiplying you use XOR. For a fixed small (it needs << ~70 bits) set like yours, it might be possible to tune the zobrist tables to totally avoid collisions (yielding a perfect hash).

And the final (and simplest) way is to use a (100bit) bitmap, and treat that as a hashvalue (maybe after modulo table size)

And a totally unrelated method is to just build a decision tree on the bits of the bitmap. (the tree would have a maximal depth of k) a related kD tree on bit values

Upvotes: 1

Aki Suihkonen
Aki Suihkonen

Reputation: 20027

Calculate a 'diff' from each set {1, 6, 87, 89} = {1,5,81,2,0,0,...} {1,2,3,4} = { 1,1,1,1,0,0,0,0... };

Then binary encode each number with a variable length encoding and concatenate the bits.

It's hard to compare the sets (except for the first few equal bits), but because there can't be many large intervals in a set, all possible values just might fit into 64 bits. (slack of 16 bits at least...)

Upvotes: 0

Rafael T
Rafael T

Reputation: 15679

May be not the best solution, but you can do the following:

  1. Sort the set from Lowest to highest with a simple IntegerComparator
  2. Add each item of the set to a String

so if you have {2,5,9,4} first Step->{2,4,5,9}; second->"2459"

This way you will get a unique String from a unique set. If you really need to map them to an integer value, you can hash the string after that.

A second way I can think of is to store them in a java Set and simply map it against a HashMap with set as keys

Upvotes: 0

Related Questions