Reputation: 543
Consider the following code:
def xorHash(n):
mask = random.getrandbits(32)
def xorIt(x):
return (hash(x) ^ mask) % n
return xorIt
This returns a random hash function which maps elements into a number in {0,1,...,rng-1}.
I want to create a random hash function that maps each element into exactly k
elements in {0,1,...,rng-1} (without repetitions). The example above does the job for k=1.
What is the most efficient way of creating a random hash function which returns a k-sized random subset of {0,1,...,rng-1}?
Upvotes: 2
Views: 468
Reputation: 281683
Seed an RNG with an ordinary integer-valued randomized hash of your data and use it to draw a random sample from the desired range:
def generate_randomized_set_valued_hash_function(n, k):
hashfunc = generate_randomized_hash_function()
def set_valued_hashfunc(x):
rng = random.Random(hashfunc(x))
return set(rng.sample(xrange(n), k))
return set_valued_hashfunc
What RNG and what integer-valued hash function you choose will depend on how strong and how fast you need your set-valued hash function to be.
Upvotes: 1
Reputation: 1703
if the range is relatively small then you can create an array of your items. you will make them in a random order by shuffling the items by generating a random and swapping the first with the item at the generated number.
if your range is relatively big you can generate numbers in the full range and if you get an item that is not unique just try again.
by the way your code has an issue that your numbers are probably not unified distributed because you use the % operator. the reminder technique creates a bias for small numbers, you can read more in the following posts:
How much bias is introduced by the remainder technique
Upvotes: 0