Reputation: 21471
I have 2 instances of classes that I would like to resolve to the same key in a dictionary:
class CustomClass():
def __hash__(self):
return 2
a = CustomClass()
b = CustomClass()
dicty = {a : 1}
Here, a and b are not equal as being keys:
>>> a in dicty
True
>>> b in dicty
False
What exactly is happening with hash; it seemed like a second instance of the CustomClass should match the hashing? What is going on that these hashes do not match?
I just now discovered the actual class is what is being hashed. So how to add a custom dictionary key for a class (i.e. when I try to use a class as a key for a dictionary, how should it be stored so that a and b match)?
Note that in this case I do not care about keeping a link to the original object in the dictionary, I can work with some unusable key object; just it matters they resolve to the same.
EDIT:
Perhaps some advice on the actual case I'd like to solve is required.
I have classes containing boolean np.arrays
of shape (8,6)
. I want to hash these such that whenever this object is put into a dictionary, the comparison takes place on these values. I made a bitarray out of them according to this answer. I noticed it has a __cmp__
there (thanks thefourtheye
for showing I have to look there). However, my class can be updated, so I'd only like to hash the np.array when I'm actually trying to put it into a dictionary, and not on initiation (and thus storing the hashable bitarray whenever I init, since the np.array might be updated, such that the hash is not a real representation anymore). I know that whenever I would update the np.array, I could also update the hashed value, but I'd prefer to only hash once!
Upvotes: 1
Views: 745
Reputation: 1979
You should implement __eq__
method to make your object hashable
.
The definition of hashable
from doc:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
Upvotes: 1
Reputation: 239653
You broke the contract between __hash__
, __cmp__
and __eq__
. Quoting the __hash__
documentation,
If a class does not define a
__cmp__()
or__eq__()
method it should not define a__hash__()
operation either; if it defines__cmp__()
or__eq__()
but not__hash__()
, its instances will not be usable in hashed collections. If a class defines mutable objects and implements a__cmp__()
or__eq__()
method, it should not implement__hash__()
, since hashable collection implementations require that a object’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).User-defined classes have
__cmp__()
and__hash__()
methods by default; with them, all objects compare unequal (except with themselves) andx.__hash__()
returns an appropriate value such thatx == y
implies both thatx is y
andhash(x) == hash(y)
.
In your case, the hash value is the same for two objects and hash Collision is common in any hash implementation. So, Python compares the object being looked up with the help __eq__
method and finds out that the actual object being searched is not the same as the object already stored in. That is why b in dicty
returns False
.
So, to fix your problem, define custom __eq__
function also, like this
class CustomClass():
def __init__(self):
self.data = <something>
def __hash__(self):
# Find hash value based on the `data`
return hash(self.data)
def __eq__(self, other):
return self.data == other.data
Note: __hash__
value should always be the same for a given object. So, please make sure that the data
is never changed after assigned initially. Otherwise you ll never be able to get the object from the dictionary, since hash
value of data
will be different, if it changes in the later point of time.
Upvotes: 5
Reputation: 12795
__hash__
just determines which bucket the value will be placed into. Within the bucket python always calls to __eq__
to make sure it doesn't return an element that just happened to have the same hash, but which is in fact different, so you need to implement your own __eq__
as well.
class CustomClass():
def __hash__(self):
return 2
def __eq__(self, other):
return hash(other) == hash(self)
a = CustomClass()
b = CustomClass()
dicty = {a : 1}
print a in dicty
print b in dicty
print "a" in dicty
Upvotes: 1
Reputation: 40853
The problem is that the hash function can cause collisions -- different objects can produce the same hash value. As a result the final check to see if an object is present in a dict is still done using an equality comparison (ie. x == y
). The hash value is first used to find the relevant objects quickly.
If you want the behaviour you describe then you must also override __eq__
as well.
eg.
class CustomClass:
def __hash__(self):
return 2
def __eq__(self, other):
return type(self) is type(other) and type(self) is CustomClass
Upvotes: 1