Reputation: 963
I have a list of tuples like (id, ) and I want to remove duplicates ids. In the case where there are multiple pairs with the same id, I'd like to keep the one that has an object with a higher score. How could I implement this efficiently?
# For the sake of example - assume that a hashing function is implemented based on the score
class Object
def __init__(self):
score = 0
def __repr__(self):
return f'<Object {self.score}>'
pairs = [(1, <Object 1>), (1, <Object 1>), (3, <Object 7>), (9, <Object 3>), (9, <Object 4>)]
filtered_pairs = [(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]
I know that I can call set
on the pairs, but that'll only take care of the cases where both the id and score are equivalent (like with Object 1). How can I filter it but in the case where there are matching ids, take the higher score?
I know that I could do groupby from itertools, and implement a sort using the score as the key and then just take the last item from every group, but I'm wondering if there's a more efficient way.
Upvotes: 1
Views: 67
Reputation: 474
You can sort by score, convert to dict (so that max scores are dict values) and convert back to list of tuples:
class Object:
def __init__(self, score):
self.score = score
def __repr__(self):
return f'<Object {self.score}>'
def __gt__(self, other):
return self.score > other.score
pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(4)), (9, Object(3))]
filtered_pairs = list(dict(sorted(pairs)).items())
Upvotes: 0
Reputation: 50899
You can use itertools.groupby
to group by the first values and use max on the result
from itertools import groupby
class Object:
def __init__(self, score):
self.score = score
def __repr__(self):
return f'<Object {self.score}>'
pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(3)), (9, Object(4))]
filtered_pairs = [max(list(elem), key=lambda x: x[1].score) for grp, elem in groupby(pairs, lambda x: (x[0]))]
print(filtered_pairs)
Output:
[(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]
Upvotes: 2
Reputation: 31354
Something like this:
from collections import namedtuple
Pair = namedtuple('Pair', ['id', 'score'])
pairs = [Pair(*t) for t in [(1, 1), (1, 1), (3, 7), (9, 3), (9, 4)]]
best_pairs = {}
for p in pairs:
if p.id not in best_pairs or p.score > best_pairs[p.id]:
best_pairs[p.id] = p.score
pairs = [Pair(*t) for t in best_pairs.items()]
print(pairs)
namedtuple
is only in there as a more concise version of your Object
and the conversion back to pairs
as a list of Pairs is only in there in case you don't like your result being the dictionary best_pairs
.
Result:
[Pair(id=1, score=1), Pair(id=3, score=7), Pair(id=9, score=4)]
Upvotes: 0
Reputation: 92450
Since you are considering a set, I'm assuming the original order isn't important. If that's the case, one options is to add a __lt__
method to your class so you can compare objects by score. Then sort the tuples in reverse order, group by the integer, and take the first item from each group. It's easier to see in code than explain:
from itertools import groupby
class myObject:
def __init__(self, score):
self.score = score
def __repr__(self):
return f'<Object {self.score}>'
def __lt__(self, other):
return self.score < other.score
pairs = [(1, myObject(1)), (1, myObject(1)), (3, myObject(7)), (9, myObject(3)), (9, myObject(4))]
[next(v) for k, v in groupby(sorted(pairs, reverse=True), key=lambda x: x[0])]
Result
[(9, <Object 4>), (3, <Object 7>), (1, <Object 1>)]
Upvotes: 1