Vincent Ketelaars
Vincent Ketelaars

Reputation: 1109

What happens when objects in a Set are altered to match each other?

As the title suggests I have a question regarding changing objects in Sets such that they become exactly the same (in the eyes of the set). Just curious.

I ask this question with regard to Python, but if it is generalizable feel free to do that.

If I have understood correctly in Python the Set iterable will determine whether objects are 'equal' by equating their hashes. So for objects a and b this would be:

hash(a) == hash(b)

For any object you make you can overwrite the standard hash function, __hash__, to your specific liking.

Suppose you create a hash function that takes several or all the objects in your object and uses the combination of hashes as its own (e.g. by ORing them).

Now if you have several initially different objects in a single Set, and consequently go over that Set and alter the objects within such that their inner objects match, what will happen to the Set? Will they all remain there, or will they get kicked out, or do we need to wait till an operation is performed on the Set? Or do we raise some error somewhere?

Upvotes: 6

Views: 557

Answers (4)

georg
georg

Reputation: 215029

Consider this test:

class A:
    def __init__(self, h):
        self.h = h

    def __hash__(self):
        return self.h

x = A(1)
y = A(2)

a = {x, y}

print x in a, y in a
print a

print "----"

x.h = 2

print x in a, y in a
print a

Result:

True True
set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>])
----
False True
set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>])

As you can see, the first object x is still there, but the in operator reports it isn't. Why did this happen?

From my understanding, Set objects are implemented using hash tables, and a hash table normally has a structure like this:

 hash_value => list of objects with this hash value
 another_hash_value => list of objects with this hash value

When a Set answers to in requests, it first calculates the hash value for the argument and then tries to locate it in the corresponding list. Our set a is initially like this:

  1 => [x]
  2 => [y]

Now, we change the x's hash and ask the set if the object is there. The set calculates the hash value (which is now 2) tries to locate x in the second list and fails - hence False.

To make things more interesting, let's do

a.add(x)
print x in a, y in a
print a

Result:

True True
set([<__main__.A instance at 0x107cbfd40>, 
     <__main__.A instance at 0x107cbfd88>, 
     <__main__.A instance at 0x107cbfd40>])

Now we have the same object twice in the set! As you can see, there's no automatic adjustment and no errors. Python is a grown-ups' language and always assumes you know what you're doing.

Upvotes: 6

NPE
NPE

Reputation: 500773

You are not allowed to modify a member of a set in a way that changes its hash value.

In Python, you can only store hashable objects in a set. From the documentation (emphasis mine):

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is their id().

If you break this contract (as you are proposing to in your question), the set can't do its job and all bets are off.

The correct way to modify a member of a set is to remove, alter, and re-add. This will behave as you would expect it to.

[the set] will determine whether objects are 'equal' by equating their hashes

That's not quite correct. Comparing the hashes cannot be used to establish that the objects are equal. It can only be used to establish that the objects are unequal. It's a subtle but important distinction.

Upvotes: 5

BartoszKP
BartoszKP

Reputation: 35901

First of all, elements of a set need to be hashable:

The elements of a set must be hashable.

While hashable means:

An object is hashable if it has a hash value which never changes during its lifetime [...]

So as long as you don't change the object in a way that its hash value (result of its __hash__ method) remains constant everything is fine.

It is common in Python that immutable objects are considered hashable, while mutable ones are not:

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are.

Upvotes: 2

Yann Vernier
Yann Vernier

Reputation: 15887

ORing together hashes would create a particularly bad hash function, as you'd get a heavy tendency toward values with more bits set. Still, sets and dictionaries use the hashes for a hash table; collisions are expected, and a deeper comparison is performed for objects with equal hash values. You do however lose the advantage of a hash table - O(1) lookups - if the hash function is bad.

Also as covered by other answers, sets should only hold immutable values. Changing the hash value of an object after inserting it into the sets breaks the conditions for the set type, and operations like checking whether the object is in the set, or even removing the object from the set, will fail. I expect you'll still be able to find it by iterating through the set, however.

Upvotes: 0

Related Questions