Reputation: 24621
I have set of objects:
class Test(object):
def __init__(self):
self.i = random.randint(1,10)
res = set()
for i in range(0,1000):
res.add(Test())
print len(res) = 1000
How to remove duplicates from set of objects ?
Thanks for answers, it's work:
class Test(object):
def __init__(self, i):
self.i = i
# self.i = random.randint(1,10)
# self.j = random.randint(1,20)
def __keys(self):
t = ()
for key in self.__dict__:
t = t + (self.__dict__[key],)
return t
def __eq__(self, other):
return isinstance(other, Test) and self.__keys() == other.__keys()
def __hash__(self):
return hash(self.__keys())
res = set()
res.add(Test(2))
...
res.add(Test(8))
result: [2,8,3,4,5,6,7]
but how to save order ? Sets not support order. Can i use list instead set for example ?
Upvotes: 6
Views: 2095
Reputation: 365717
Your first question is already answered by Pavel Anossov.
But you have another question:
but how to save order ? Sets not support order. Can i use list instead set for example ?
You can use a list
, but there are a few downsides:
if foo not in res: res.append(foo)
. Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.What you want is something that works like an ordered set
. Or, equivalently, like a list
that doesn't allow duplicates.
If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list
, then using unique_everseen
from the itertools
recipes to remove duplicates.
Or you could just keep a set
and a list
or elements by order (or a list
plus a set
of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.
Ideally, you want to wrap it up in a type that has exactly the same API as set
. Something like an OrderedSet
akin to collections.OrderedDict
.
Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet
recipe at ActiveState.
So, copy it, paste it into your code, then just change res = set()
to res = OrderedSet()
, and you're done.
Upvotes: 1
Reputation: 104712
Pavel Anossov's answer is great for allowing your class to be used in a set with the semantics you want. However, if you want to preserve the order of your items, you'll need a bit more. Here's a function that de-duplicates a list, as long as the list items are hashable:
def dedupe(lst):
seen = set()
results = []
for item in lst:
if item not in seen:
seen.add(item)
results.append(item)
return results
A slightly more idiomatic version would be a generator, rather than a function that returns a list. This gets rid of the results
variable, using yield
rather than appending the unique values to it. I've also renamed the lst
parameter to iterable
, since it will work just as well on any iterable object (such as another generator).
def dedupe(iterable):
seen = set()
for item in iterable:
if item not in seen:
seen.add(item)
yield item
Upvotes: 0
Reputation: 279
I think you can easily do what you want with a list as you asked in your first post since you defined the eq operator :
l = []
if Test(0) not in l :
l.append(Test(0))
My 2 cts ...
Upvotes: 0
Reputation: 62908
Your objects must be hashable (i.e. must have __eq__()
and __hash__()
defined) for sets to work properly with them:
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
def __eq__(self, other):
return self.i == other.i
def __hash__(self):
return self.i
An object is hashable if it has a hash value which never changes during its lifetime (it needs a
__hash__()
method), and can be compared to other objects (it needs an__eq__()
or__cmp__()
method). Hashable objects which compare equal must have the same hash value.Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
If you have several attributes, hash and compare a tuple of them (thanks, delnan
):
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
self.k = random.randint(1, 10)
self.j = random.randint(1, 10)
def __eq__(self, other):
return (self.i, self.k, self.j) == (other.i, other.k, other.j)
def __hash__(self):
return hash((self.i, self.k, self.j))
Upvotes: 14