Reputation: 363233
Following on from this question, I'm interested to know when is a python object's hash computed?
__init__
time,__hash__()
is called,__hash__()
is called, orMay this vary depending on the type of the object?
Why does hash(-1) == -2
whilst other integers are equal to their hash?
Upvotes: 32
Views: 5531
Reputation: 2409
From here:
The hash value -1 is reserved (it’s used to flag errors in the C implementation). If the hash algorithm generates this value, we simply use -2 instead.
As integer's hash is integer itself it's just changed right away.
Upvotes: 9
Reputation: 67000
The hash is generally computed each time it's used, as you can quite easily check yourself (see below). Of course, any particular object is free to cache its hash. For example, CPython strings do this, but tuples don't (see e.g. this rejected bug report for reasons).
The hash value -1 signals an error in CPython. This is because C doesn't have exceptions, so it needs to use the return value. When a Python object's __hash__
returns -1, CPython will actually silently change it to -2.
See for yourself:
class HashTest(object):
def __hash__(self):
print('Yes! __hash__ was called!')
return -1
hash_test = HashTest()
# All of these will print out 'Yes! __hash__ was called!':
print('__hash__ call #1')
hash_test.__hash__()
print('__hash__ call #2')
hash_test.__hash__()
print('hash call #1')
hash(hash_test)
print('hash call #2')
hash(hash_test)
print('Dict creation')
dct = {hash_test: 0}
print('Dict get')
dct[hash_test]
print('Dict set')
dct[hash_test] = 0
print('__hash__ return value:')
print(hash_test.__hash__()) # prints -1
print('Actual hash value:')
print(hash(hash_test)) # prints -2
Upvotes: 31
Reputation: 95722
It is easy to see that option #3 holds for user defined objects. This allows the hash to vary if you mutate the object, but if you ever use the object as a dictionary key you must be sure to prevent the hash ever changing.
>>> class C:
def __hash__(self):
print("__hash__ called")
return id(self)
>>> inst = C()
>>> hash(inst)
__hash__ called
43795408
>>> hash(inst)
__hash__ called
43795408
>>> d = { inst: 42 }
__hash__ called
>>> d[inst]
__hash__ called
Strings use option #2: they calculate the hash value once and cache the result. This is safe because strings are immutable so the hash can never change, but if you subclass str
the result might not be immutable so the __hash__
method will be called every time again. Tuples are usually thought of as immutable so you might think the hash could be cached, but in fact a tuple's hash depends on the hash of its content and that might include mutable values.
For @max who doesn't believe that subclasses of str
can modify the hash:
>>> class C(str):
def __init__(self, s):
self._n = 1
def __hash__(self):
return str.__hash__(self) + self._n
>>> x = C('hello')
>>> hash(x)
-717693723
>>> x._n = 2
>>> hash(x)
-717693722
Upvotes: 1