Reputation: 11628
Consider the following Python interpreter shell session:
>>> class D(dict):
... def __hash__(self):
... return id(self)
...
>>> d1 = D({'a': 'b'})
>>> d2 = D({'a1': 'b1'})
>>> t = {d1: 1, d2: 2}
>>> t[d1]
1
>>> t[d2]
2
Why doesn't dict's __hash__
default to id()
? What led to the design decision of forbidding to use mutable entities as dictionary keys?
Upvotes: 2
Views: 1483
Reputation: 36063
In general it is more useful and intuitive to have equality based on value and not identity. For example, consider the following snippet:
from collections import Counter
words = 'dog cat dog'.split()
print Counter(words) # Counter({'dog': 2, 'cat': 1})
That makes sense. It is expected. Now, if things worked your way, we'd have this:
dicts = [{}, {}]
print Counter(dicts) # Counter({{}: 1, {}: 1})
That is NOT the first thing I would expect. You could explain it to me, but the first time I came across it (especially if I was new to programming) it would probably cause me quite some frustration while debugging. And even after understanding it it would probably catch me off guard once in a while. So while the language could be designed in your way, it would not be user friendly.
Similarly, dictionaries could be hashed based on content, and that would make the above snippet behave more intuitively. But now if I mutate the key I'm in serious trouble. I could be warned not to, but rather have the language protect me from making this kind of mistake than giving me all the freedom I want to shoot myself in the foot.
A better way to go would be to use an immutable dict
, which you can get here.
Upvotes: 1
Reputation: 49330
dict
's __hash__
doesn't default to id()
because it would defeat the fundamental purpose of hash tables.
As linked in this comment on another question's answer:
And see the Python FAQ entry Why must dictionary keys be immutable?
The linked documentation says the following (with additional explanation for those who want to read more):
The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.
Emphasis added.
Basically, the hash table is only useful if it only holds immutable values. Workarounds like id()
defeat the entire purpose, because (as explained in the bold portion of the above quote) you wouldn't find values that you tried to look up.
Upvotes: 3
Reputation: 282198
Why doesn't dict's
__hash__
default toid()
?
Because that violates the fundamental invariant that equal objects have equal hashes. If dicts used their id
for their hash, then you'd have interactions like the following:
>>> x, y = {}, {}
>>> x == y
True
>>> hash(x) == hash(y)
False
>>> x[{}] = 3
>>> x[{}]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: {}
The behavior would be confusing, inconsistent, and not useful.
Upvotes: 7