gsamaras
gsamaras

Reputation: 73366

Alter the hash function of a dictionary

Following this question, we know that two different dictionaries, dict_1 and dict_2 for example, use the exact same hash function.

Is there any way to alter the hash function used by the dictionary?Negative answers also accepted!

Upvotes: 5

Views: 2732

Answers (3)

mobiusklein
mobiusklein

Reputation: 1423

Here is a "hash table" on top of a list of lists, where each hash table object is associated with a particular hashing function.

class HashTable(object):
    def __init__(self, hash_function, size=256):
        self.hash_function = hash_function
        self.buckets = [list() for i in range(size)]
        self.size = size

    def __getitem__(self, key):
        hash_value = self.hash_function(key) % self.size
        bucket = self.buckets[hash_value]
        for stored_key, stored_value in bucket:
            if stored_key == key:
                return stored_value
        raise KeyError(key)


    def __setitem__(self, key, value):
        hash_value = self.hash_function(key) % self.size
        bucket = self.buckets[hash_value]
        # if the key is in the bucket, replace the value and return
        for i, (stored_key, stored_value) in enumerate(bucket):
            if stored_key == key:
                 bucket[i] = (key, value)
                 return
        # otherwise append the key value pair to the bucket
        bucket.append((key, value))

The rest of your application can still see the underlying list of buckets. Your application might require additional metadata to be associated with each bucket, but that would be as simple as defining a new class for the elements of the bucket list instead of a plain list.

Upvotes: 2

Reut Sharabani
Reut Sharabani

Reputation: 31339

I think what you want is a way to create buckets. Based on this I recommend collections.defaultdict with a set initializer as the "bucket" (depends on what you're using it for though).

Here is a sample:

#!/usr/bin/env python

from collections import defaultdict
from itertools import combinations

d = defaultdict(set)

strs = ["str", "abc", "rts"]
for s in strs:
    d[hash(s)].add(s)
    d[hash(''.join(reversed(s)))].add(s)

for combination in combinations(d.values(), r=2):
    matches = combination[0] & combination[1]
    if len(matches) > 1:
        print matches

# output: set(['str', 'rts'])

Two strings ending up in the same buckets here are very likely the same. I've created a hash collision by using the reverse function and using a string and it's reverse as values.

Note that the set will use full comparison but should do it very fast.

Don't hash too many values without draining the sets.

Upvotes: 1

deets
deets

Reputation: 6395

You can't change the hash-function - the dict will call hash on the keys it's supposed to insert, and that's that.

However, you can wrap the keys to provide different __hash__ and __eq__-Methods.

class MyHash(object):
     def __init__(self, v):
         self._v = v

     def __hash__(self):
         return hash(self._v) * -1

     def __eq__(self, other):
         return self._v == other._v

If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.

Upvotes: 5

Related Questions