Joylove
Joylove

Reputation: 414

Filter a list of tuples on the longest item in the tuple

Say I have this data

my_list_of_tuples = [
    ('bill', [(4, ['626']), (4, ['253', '30', '626']),
              (4, ['253', '30', '626']), (4, ['626']),
              (4, ['626']), (4, ['626'])]),
    ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']),
               (2, ['6']), (2, ['6']), (2, ['6'])]),
    ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])
]

And I want to keep out all the items that are longest in a sub-tuple list element, and duplicates are removed, so that I am left with

my_output_list_of_tuples = [
    ('bill',  [(4, ['253', '30', '626'])]),
    ('sarah',  [(2, ['2', '6'])]),
    ('fred',  [(1, ['6']), (1, ['2'])])]

So far I tried

my_output_list_of_tuples = [(x[0], max(x[1], key=lambda tup: len(tup[1]))) for x in my_list_of_tuples] 

but that does not work for fred, because the max function only returns one item. I also tried a few map attempts and lamba but got less far.

I'm OK to break it up like

for my_list_of_tuples_by_person_name in my_list_of_tuples:
    #Do something with my_list_of_tuples_by_person_name[1]

Any ideas?

Thanks in advance :)

Upvotes: 0

Views: 369

Answers (3)

abarnert
abarnert

Reputation: 365717

If you want to preserve duplicates like this, you can't just call max, you have to compare each value to the result of max.

The most readable way to do this is probably to build a dict mapping keys to max lengths, and then compare each tuple against that:

result = []
for name, sublist in my_list_of_tuples:
    d = {}
    for key, subsub in sublist:
        if len(subsub) > d.get(key, 0):
            d[key] = len(subsub)
    lst =[(key, subsub) for key, subsub in sublist if len(subsub) == d[key]]
    result.append((name, lst))

You can condense most parts of this down, but it'll probably only make things more opaque and less maintainable. And notice that the naive way to condense a two-pass loop into a single expression (where you calculate max each time through) converts it into a nested (quadratic) loop, so it's going to be even more verbose than you think.


Since you've completely changed the problem and now apparently want only the longest sublist (presumably picking arbitrarily when there are duplicates, or non-duplicate-but-same-length values?), things are simpler:

result = []
for name, sublist in my_list_of_tuples:
    keysubsub = max(sublist, key=lambda keysubsub: len(keysubsub[1]))
    result.append((name, keysubsub))

But that's basically what you already had. You say the problem with it is "… but that does not work for fred, because the max function only returns one item", but I'm not sure what you want instead of one item.


If what you're looking for is all distinct lists of the maximum length, you can use a set or OrderedSet instead of a list in the first answer. There's no OrderedSet in the stdlib, but this recipe by Raymond Hettinger should be fine for our purposes. But let's do it manually with a set and a list:

result = []
for name, sublist in my_list_of_tuples:
    d = {}
    for key, subsub in sublist:
        if len(subsub) > d.get(key, 0):
            d[key] = len(subsub)
    lst, seen = [], set()
    for key, subsub in sublist:
        if len(subsub) == d[key] and tuple(subsub) not in seen:
            seen.add(tuple(subsub))
            lst.append((key, subsub))
    result.append((name, lst))

I think this last one provides exactly the output your updated question asks, and doesn't do anything hard to understand to get there.

Upvotes: 2

Elmex80s
Elmex80s

Reputation: 3504

First you define a function

def f(ls):
    max_length = max(len(y) for (x, y) in ls)

    result = []

    for (x, y) in ls:
        if len(y) == max_length and (x, y) not in result:
            result.append((x, y))

    return result

Now call it like this

>>> from pprint import pprint
>>> pprint([(name, f(y)) for name, y in my_list_of_tuples])
[('bill', [(4, ['253', '30', '626'])]),
 ('sarah', [(2, ['2', '6'])]),
 ('fred', [(1, ['6']), (1, ['2'])])]

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71451

You can use max:

my_list_of_tuples = my_list_of_tuples = [('bill', [(4, ['626']), (4, ['253', '30', '626']), (4, ['253', '30', '626']), (4, ['626']), (4, ['626']), (4, ['626'])]), ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']), (2, ['6']), (2, ['6']), (2, ['6'])]), ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])]
final_result = [(a, [(c, d) for c, d in b if len(d) == max(map(len, [h for _, h in b]))]) for a, b in my_list_of_tuples]
new_result = [(a, [c for i, c in enumerate(b) if c not in b[:i]]) for a, b in final_result]

Output:

[('bill', [(4, ['253', '30', '626'])]), ('sarah', [(2, ['2', '6'])]), ('fred', [(1, ['6']), (1, ['2'])])]

Upvotes: 1

Related Questions