Reputation: 10558
I'm trying to create a custom object that behaves properly in set operations.
I've generally got it working, but I want to make sure I fully understand the implications. In particular, I'm interested in the behavior when there is additional data in the object that is not included in the equal / hash methods. It seems that in the 'intersection' operation, it returns the set of objects that are being compared to, where the 'union' operations returns the set of objects that are being compared.
To illustrate:
class MyObject:
def __init__(self,value,meta):
self.value = value
self.meta = meta
def __eq__(self,other):
return self.value == other.value
def __hash__(self):
return hash(self.value)
a = MyObject('1','left')
b = MyObject('1','right')
c = MyObject('2','left')
d = MyObject('2','right')
e = MyObject('3','left')
print a == b # True
print a == c # False
for i in set([a,c,e]).intersection(set([b,d])):
print "%s %s" % (i.value,i.meta)
#returns:
#1 right
#2 right
for i in set([a,c,e]).union(set([b,d])):
print "%s %s" % (i.value,i.meta)
#returns:
#1 left
#3 left
#2 left
Is this behavior documented somewhere and deterministic? If so, what is the governing principle?
Upvotes: 10
Views: 5754
Reputation: 95941
Let's say your objects have two different types of attributes: key attributes and data attributes. In your example, MyObject.value
is a key attribute.
Store all your objects as values in a dictionary, keyed by the key attributes, making sure that only your preferred (e.g. with the oldest timestamp) are entered in the dictionary. Perform your set operations with the same key as used in the dictionary, and retrieve the actual objects from the dictionary:
result= [dict1[k] for k in set_operation_result]
Upvotes: 0
Reputation: 151
Nope, it's not deterministic. The problem is that you've broken equals' and hash's invariant, that two objects are equivalent when they are equal. Fix your object, don't try to be clever and abuse how set's implementation works. If the meta value is part of MyObject's identity, it should be included in eq and hash.
You can't rely on set's intersection to follow any order, so there is no way to easily do what you want. What you would end up doing is taking the intersection by value only, then look through all your objects for an older one to replace it with, for each one. No nice way to do it algorithmically.
Unions are not so bad:
##fix the eq and hash to work correctly
class MyObject:
def __init__(self,value,meta):
self.value = value
self.meta = meta
def __eq__(self,other):
return self.value, self.meta == other.value, other.meta
def __hash__(self):
return hash((self.value, self.meta))
def __repr__(self):
return "%s %s" % (self.value,self.meta)
a = MyObject('1','left')
b = MyObject('1','right')
c = MyObject('2','left')
d = MyObject('2','right')
e = MyObject('3','left')
union = set([a,c,e]).union(set([b,d]))
print union
#set([2 left, 2 right, 1 left, 3 left, 1 right])
##sort the objects, so that older objs come before the newer equivalents
sl = sorted(union, key= lambda x: (x.value, x.meta) )
print sl
#[1 left, 1 right, 2 left, 2 right, 3 left]
import itertools
##group the objects by value, groupby needs the objs to be in order to do this
filtered = itertools.groupby(sl, lambda x: x.value)
##make a list of the oldest (first in group)
oldest = [ next(group) for key, group in filtered]
print oldest
#[1 left, 2 left, 3 left]
Upvotes: 4
Reputation: 46433
Order doesn't appear to matter:
>>> [ (u.value, u.meta) for u in set([b,d]).intersection(set([a,c,e])) ]
[('1', 'right'), ('2', 'right')]
>>> [ (u.value, u.meta) for u in set([a,c,e]).intersection(set([b,d])) ]
[('1', 'right'), ('2', 'right')]
However, if you do this:
>>> f = MyObject('3', 'right')
And add f
to the "right" set:
>>> [ (u.value, u.meta) for u in set([a,c,e]).intersection(set([b,d,f])) ]
[('1', 'right'), ('3', 'right'), ('2', 'right')]
>>> [ (u.value, u.meta) for u in set([b,d,f]).intersection(set([a,c,e])) ]
[('1', 'left'), ('3', 'left'), ('2', 'left')]
So you can see that the behavior depends on the size of the sets (the same effect happens if you union
). It may be dependent on other factors as well. I think you're hunting through the python source if you want to know why.
Upvotes: 1