d33tah
d33tah

Reputation: 11628

Why mutable entities cannot be a dictionary key?

Consider the following Python interpreter shell session:

>>> class D(dict):
...   def __hash__(self):
...     return id(self)
... 
>>> d1 = D({'a': 'b'})
>>> d2 = D({'a1': 'b1'})
>>> t = {d1: 1, d2: 2}
>>> t[d1]
1
>>> t[d2]
2

Why doesn't dict's __hash__ default to id()? What led to the design decision of forbidding to use mutable entities as dictionary keys?

Upvotes: 2

Views: 1483

Answers (3)

Alex Hall
Alex Hall

Reputation: 36063

In general it is more useful and intuitive to have equality based on value and not identity. For example, consider the following snippet:

from collections import Counter
words = 'dog cat dog'.split()
print Counter(words)  # Counter({'dog': 2, 'cat': 1})

That makes sense. It is expected. Now, if things worked your way, we'd have this:

dicts = [{}, {}]
print Counter(dicts)  # Counter({{}: 1, {}: 1})

That is NOT the first thing I would expect. You could explain it to me, but the first time I came across it (especially if I was new to programming) it would probably cause me quite some frustration while debugging. And even after understanding it it would probably catch me off guard once in a while. So while the language could be designed in your way, it would not be user friendly.

Similarly, dictionaries could be hashed based on content, and that would make the above snippet behave more intuitively. But now if I mutate the key I'm in serious trouble. I could be warned not to, but rather have the language protect me from making this kind of mistake than giving me all the freedom I want to shoot myself in the foot.

A better way to go would be to use an immutable dict, which you can get here.

Upvotes: 1

TigerhawkT3
TigerhawkT3

Reputation: 49330

dict's __hash__ doesn't default to id() because it would defeat the fundamental purpose of hash tables.

As linked in this comment on another question's answer:

And see the Python FAQ entry Why must dictionary keys be immutable?

The linked documentation says the following (with additional explanation for those who want to read more):

The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.

Emphasis added.

Basically, the hash table is only useful if it only holds immutable values. Workarounds like id() defeat the entire purpose, because (as explained in the bold portion of the above quote) you wouldn't find values that you tried to look up.

Upvotes: 3

user2357112
user2357112

Reputation: 282198

Why doesn't dict's __hash__ default to id()?

Because that violates the fundamental invariant that equal objects have equal hashes. If dicts used their id for their hash, then you'd have interactions like the following:

>>> x, y = {}, {}
>>> x == y
True
>>> hash(x) == hash(y)
False
>>> x[{}] = 3
>>> x[{}]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: {}

The behavior would be confusing, inconsistent, and not useful.

Upvotes: 7

Related Questions