Reputation: 68486
I have a list of objects (Foo). A Foo object has several attributes. An instance of a Foo object is equivalent (equal) to another instance of a Foo object iff (if and only if) all the attributes are equal.
I have the following code:
class Foo(object):
def __init__(self, myid):
self.myid=myid
def __eq__(self, other):
if isinstance(other, self.__class__):
print 'DEBUG: self:',self.__dict__
print 'DEBUG: other:',other.__dict__
return self.__dict__ == other.__dict__
else:
print 'DEBUG: ATTEMPT TO COMPARE DIFFERENT CLASSES:',self.__class__,'compared to:', other.__class__
return False
import copy
f1 = Foo(1)
f2 = Foo(2)
f3 = Foo(3)
f4 = Foo(4)
f5 = copy.deepcopy(f3) # overkill here (I know), but needed for my real code
f_list = [f1,f2,f3,f4,f5]
# Surely, there must be a better way? (this dosen't work BTW!)
new_foo_list = list(set(f_list))
I often used this little (anti?) 'pattern' above (converting to set and back), when dealing with simple types (int, float, string - and surprisingly datetime.datetime types), but it has come a cropper with the more involved data type - like Foo above.
So, how could I change the list f1 above into a list of unique items - without having to loop through each item and doing a check on whether it already exists in some temporary cache etc etc?.
What is the most pythonic way to do this?
Upvotes: 2
Views: 226
Reputation: 151187
First, I want to emphasize that using set
is certainly not an anti-pattern. set
s eliminate duplicates in O(n) time, which is the best you can do, and way better than the naive O(n^2) solution of comparing every item to every other item. It's even better than sorting -- and indeed, it seems your data structure might not even have a natural order, in which case sorting doesn't make a lot of sense.
The problem with using a set in this case is that you have to define a custom __hash__
method. Others have said this. But whether or not you can do so easily is an open question -- it depends on details about your actual class that you haven't told us. For example, if any attributes of a Foo
object above are not hashable, then creating a custom hash function is going to be difficult, because you'll have to not only write a custom hash for Foo
objects, you'll also have to write custom hashes for every other type of object!
So you need to tell us more about what kinds of attributes your class has if you want a conclusive answer. But I can offer some speculation.
Assuming that a hash function could be written for Foo
objects, but also assuming that that Foo
objects are mutable and so really shouldn't have a __hash__
method, as Niklas B. points out, here is one workable approach. Create a function freeze
that, given a mutable instance of Foo
, returns an immutable collection of the data in Foo
. So for example, say Foo has a dict
and a list
in it; freeze
returns a tuple
containing a tuple
of tuple
s (representing the dict
) and another tuple
(representing the list
). The function freeze
should have the following property:
freeze(a) == freeze(b)
If and only if
a == b
Now pass your list through the following code:
dupe_free = dict((freeze(x), x) for x in dupe_list).values()
Now you have a dupe free list in O(n) time. (Indeed, after adding this suggestion, I saw that fraxel suggested something similar; but I think using a custom function -- or even a method -- (x.freeze(), x)
-- is the better way to go, rather than relying on __dict__
as he does, which can be unreliable. The same goes for your custom __eq__
method, IMO -- __dict__
is not always a safe shortcut for various reasons I can't get into here.)
Another approach would be to use only immutable objects in the first place! For example, you could use namedtuple
s. Here's an example stolen from the python docs:
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
>>> p[0] + p[1] # indexable like the plain tuple (11, 22)
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> p # readable __repr__ with a name=value style
Point(x=11, y=22)
Upvotes: 8
Reputation: 35319
Here is an alternative method, just make a dictionary keyed by __dict__.items()
for the instances:
f_list = [f1,f2,f3,f4,f5]
f_dict = dict([(tuple(i.__dict__.items()), i) for i in f_list])
print f_dict
print f_dict.values()
#output:
{(('myid', 1),): <__main__.Foo object at 0xb75e190c>,
(('myid', 2),): <__main__.Foo object at 0xb75e184c>,
(('myid', 3),): <__main__.Foo object at 0xb75e1f6c>,
(('myid', 4),): <__main__.Foo object at 0xb75e1cec>}
[<__main__.Foo object at 0xb75e190c>,
<__main__.Foo object at 0xb75e184c>,
<__main__.Foo object at 0xb75e1f6c>,
<__main__.Foo object at 0xb75e1cec>]
This way you just let the dictionary take care of the uniqueness based on attributes, and can easily retrieve the objects by getting the values.
Upvotes: 1
Reputation: 95368
According to the documentation, you need to define __hash__()
and __eq__()
for your custom class to work correctly with a set
or frozenset
, as both are implemented using hash tables in CPython.
If you implement __hash__
, keep in mind that if a == b
, then hash(a)
must equal hash(b)
. Rather than comparing the whole __dict__
s, I suggest the following more straightforward implementation for your simple class:
class Foo(object):
def __init__(self, myid):
self.myid = myid
def __eq__(self, other):
return isinstance(other, self.__class__) and other.myid == self.myid
def __hash__(self):
return hash(self.myid)
If your object contains mutable attributes, you simply shouldn't put it inside a set or use it as a dictionary key.
Upvotes: 3
Reputation: 49886
Have you tried using a set
(or frozenset
)? It's explicitly for holding a unique set of items.
You'll need to create an appropriate __hash__
method, though. set
(and frozenset
) use the __hash__
method to hash objects; __eq__
is only used on a collision, AFAIK. Accordingly, you'll want to use a hash like hash(frozenset(self.__dict__.items()))
.
Upvotes: 3
Reputation: 1313
If you are allowed you can use a set http://docs.python.org/library/sets.html
list = [1,2,3,3,45,4,45,6]
print set(list)
set([1, 2, 3, 4, 6, 45])
x = set(list)
print x
set([1, 2, 3, 4, 6, 45])
Upvotes: -1