Niklas R
Niklas R

Reputation: 16870

Python set() and __hash__ confusion

One of the classes I defined is used in a set() to filter out equal objects. But it doesn't work as I expect it, so I obviously understand something wrong.

class Foo(object):

    def __hash__(self):
        return 7

x = set()
x.add(Foo())
assert len(x) == 1
x.add(Foo())
assert len(x) == 1    # AssertionError

I expect the set to consist of only one element, but it has two. Why is that?

Upvotes: 3

Views: 741

Answers (1)

jamylak
jamylak

Reputation: 133574

Hash collisions are know to occur in sets (hash maps), no hash algorithm is good enough to have a unique hash for every item, or else it would take a long time to calculate. When a collision does occur, python falls back to checking the equality of the values with __eq__ to make sure they are not the same.

class Foo(object):

    def __hash__(self):
        return 7
    def __eq__(self, other):
        return True

>>> x = set()
>>> x.add(Foo())
>>> assert len(x) == 1
>>> x.add(Foo())
>>> assert len(x) == 1
>>> 

This is a reason why you see frightening runtimes over here but note that you can expect O(1) amortized membership checks in sets, even though they have O(N) worst case (everything is a hash collision). The worst case is very very very unlikely to occur due to Python's smart implementation.

Upvotes: 7

Related Questions