Seth Killian
Seth Killian

Reputation: 963

Custom filtering function for python objects

I have a list of tuples like (id, ) and I want to remove duplicates ids. In the case where there are multiple pairs with the same id, I'd like to keep the one that has an object with a higher score. How could I implement this efficiently?


# For the sake of example - assume that a hashing function is implemented based on the score

class Object
   def __init__(self):
       score = 0
   def __repr__(self):
       return f'<Object {self.score}>'

pairs = [(1, <Object 1>), (1, <Object 1>), (3, <Object 7>), (9, <Object 3>), (9, <Object 4>)]

filtered_pairs = [(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]

I know that I can call set on the pairs, but that'll only take care of the cases where both the id and score are equivalent (like with Object 1). How can I filter it but in the case where there are matching ids, take the higher score?

I know that I could do groupby from itertools, and implement a sort using the score as the key and then just take the last item from every group, but I'm wondering if there's a more efficient way.

Upvotes: 1

Views: 67

Answers (4)

Paulius
Paulius

Reputation: 474

You can sort by score, convert to dict (so that max scores are dict values) and convert back to list of tuples:

class Object:
    def __init__(self, score):
        self.score = score
    def __repr__(self):
        return f'<Object {self.score}>'
    def __gt__(self, other):
        return self.score > other.score


pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(4)), (9, Object(3))]
filtered_pairs = list(dict(sorted(pairs)).items())

Upvotes: 0

Guy
Guy

Reputation: 50899

You can use itertools.groupby to group by the first values and use max on the result

from itertools import groupby


class Object:

    def __init__(self, score):
        self.score = score

    def __repr__(self):
        return f'<Object {self.score}>'


pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(3)), (9, Object(4))]

filtered_pairs = [max(list(elem), key=lambda x: x[1].score) for grp, elem in groupby(pairs, lambda x: (x[0]))]
print(filtered_pairs)

Output:

[(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]

Upvotes: 2

Grismar
Grismar

Reputation: 31354

Something like this:

from collections import namedtuple

Pair = namedtuple('Pair', ['id', 'score'])

pairs = [Pair(*t) for t in [(1, 1), (1, 1), (3, 7), (9, 3), (9, 4)]]

best_pairs = {}
for p in pairs:
    if p.id not in best_pairs or p.score > best_pairs[p.id]:
        best_pairs[p.id] = p.score

pairs = [Pair(*t) for t in best_pairs.items()]

print(pairs)

namedtuple is only in there as a more concise version of your Object and the conversion back to pairs as a list of Pairs is only in there in case you don't like your result being the dictionary best_pairs.

Result:

[Pair(id=1, score=1), Pair(id=3, score=7), Pair(id=9, score=4)]

Upvotes: 0

Mark
Mark

Reputation: 92450

Since you are considering a set, I'm assuming the original order isn't important. If that's the case, one options is to add a __lt__ method to your class so you can compare objects by score. Then sort the tuples in reverse order, group by the integer, and take the first item from each group. It's easier to see in code than explain:

from itertools import groupby

class myObject:
    def __init__(self, score):
        self.score = score
    def __repr__(self):
        return f'<Object {self.score}>'
    def __lt__(self, other):
        return self.score < other.score

pairs = [(1, myObject(1)), (1, myObject(1)), (3, myObject(7)), (9, myObject(3)), (9, myObject(4))]

[next(v) for k, v in groupby(sorted(pairs, reverse=True), key=lambda x: x[0])]

Result

[(9, <Object 4>), (3, <Object 7>), (1, <Object 1>)]

Upvotes: 1

Related Questions