PascalVKooten
PascalVKooten

Reputation: 21471

Implement custom keys for dictionary so that 2 instances of same class match

I have 2 instances of classes that I would like to resolve to the same key in a dictionary:

class CustomClass(): 
    def __hash__(self):
        return 2

a = CustomClass()        
b = CustomClass()        

dicty = {a : 1}

Here, a and b are not equal as being keys:

>>> a in dicty
True
>>> b in dicty
False

What exactly is happening with hash; it seemed like a second instance of the CustomClass should match the hashing? What is going on that these hashes do not match?

I just now discovered the actual class is what is being hashed. So how to add a custom dictionary key for a class (i.e. when I try to use a class as a key for a dictionary, how should it be stored so that a and b match)?

Note that in this case I do not care about keeping a link to the original object in the dictionary, I can work with some unusable key object; just it matters they resolve to the same.

EDIT:

Perhaps some advice on the actual case I'd like to solve is required.

I have classes containing boolean np.arrays of shape (8,6). I want to hash these such that whenever this object is put into a dictionary, the comparison takes place on these values. I made a bitarray out of them according to this answer. I noticed it has a __cmp__ there (thanks thefourtheye for showing I have to look there). However, my class can be updated, so I'd only like to hash the np.array when I'm actually trying to put it into a dictionary, and not on initiation (and thus storing the hashable bitarray whenever I init, since the np.array might be updated, such that the hash is not a real representation anymore). I know that whenever I would update the np.array, I could also update the hashed value, but I'd prefer to only hash once!

Upvotes: 1

Views: 745

Answers (4)

Deck
Deck

Reputation: 1979

You should implement __eq__ method to make your object hashable. The definition of hashable from doc:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

Upvotes: 1

thefourtheye
thefourtheye

Reputation: 239653

You broke the contract between __hash__, __cmp__ and __eq__. Quoting the __hash__ documentation,

If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either; if it defines __cmp__() or __eq__() but not __hash__(), its instances will not be usable in hashed collections. If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since hashable collection implementations require that a object’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

User-defined classes have __cmp__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

In your case, the hash value is the same for two objects and hash Collision is common in any hash implementation. So, Python compares the object being looked up with the help __eq__ method and finds out that the actual object being searched is not the same as the object already stored in. That is why b in dicty returns False.

So, to fix your problem, define custom __eq__ function also, like this

class CustomClass():

    def __init__(self):
        self.data = <something>

    def __hash__(self):
        # Find hash value based on the `data`
        return hash(self.data)

    def __eq__(self, other):
        return self.data == other.data

Note: __hash__ value should always be the same for a given object. So, please make sure that the data is never changed after assigned initially. Otherwise you ll never be able to get the object from the dictionary, since hash value of data will be different, if it changes in the later point of time.

Upvotes: 5

Ishamael
Ishamael

Reputation: 12795

__hash__ just determines which bucket the value will be placed into. Within the bucket python always calls to __eq__ to make sure it doesn't return an element that just happened to have the same hash, but which is in fact different, so you need to implement your own __eq__ as well.

class CustomClass():
    def __hash__(self):
        return 2

    def __eq__(self, other):
        return hash(other) == hash(self)


a = CustomClass()     
b = CustomClass()     

dicty = {a : 1}

print a in dicty
print b in dicty
print "a" in dicty

Upvotes: 1

Dunes
Dunes

Reputation: 40853

The problem is that the hash function can cause collisions -- different objects can produce the same hash value. As a result the final check to see if an object is present in a dict is still done using an equality comparison (ie. x == y). The hash value is first used to find the relevant objects quickly.

If you want the behaviour you describe then you must also override __eq__ as well.

eg.

class CustomClass: 
    def __hash__(self):
        return 2
    def __eq__(self, other):
        return type(self) is type(other) and type(self) is CustomClass

Upvotes: 1

Related Questions