Reputation: 414
Say I have this data
my_list_of_tuples = [
('bill', [(4, ['626']), (4, ['253', '30', '626']),
(4, ['253', '30', '626']), (4, ['626']),
(4, ['626']), (4, ['626'])]),
('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']),
(2, ['6']), (2, ['6']), (2, ['6'])]),
('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])
]
And I want to keep out all the items that are longest in a sub-tuple list element, and duplicates are removed, so that I am left with
my_output_list_of_tuples = [
('bill', [(4, ['253', '30', '626'])]),
('sarah', [(2, ['2', '6'])]),
('fred', [(1, ['6']), (1, ['2'])])]
So far I tried
my_output_list_of_tuples = [(x[0], max(x[1], key=lambda tup: len(tup[1]))) for x in my_list_of_tuples]
but that does not work for fred, because the max function only returns one item. I also tried a few map attempts and lamba but got less far.
I'm OK to break it up like
for my_list_of_tuples_by_person_name in my_list_of_tuples:
#Do something with my_list_of_tuples_by_person_name[1]
Any ideas?
Thanks in advance :)
Upvotes: 0
Views: 369
Reputation: 365717
If you want to preserve duplicates like this, you can't just call max
, you have to compare each value to the result of max
.
The most readable way to do this is probably to build a dict mapping keys to max lengths, and then compare each tuple against that:
result = []
for name, sublist in my_list_of_tuples:
d = {}
for key, subsub in sublist:
if len(subsub) > d.get(key, 0):
d[key] = len(subsub)
lst =[(key, subsub) for key, subsub in sublist if len(subsub) == d[key]]
result.append((name, lst))
You can condense most parts of this down, but it'll probably only make things more opaque and less maintainable. And notice that the naive way to condense a two-pass loop into a single expression (where you calculate max
each time through) converts it into a nested (quadratic) loop, so it's going to be even more verbose than you think.
Since you've completely changed the problem and now apparently want only the longest sublist (presumably picking arbitrarily when there are duplicates, or non-duplicate-but-same-length values?), things are simpler:
result = []
for name, sublist in my_list_of_tuples:
keysubsub = max(sublist, key=lambda keysubsub: len(keysubsub[1]))
result.append((name, keysubsub))
But that's basically what you already had. You say the problem with it is "… but that does not work for fred, because the max function only returns one item", but I'm not sure what you want instead of one item.
If what you're looking for is all distinct lists of the maximum length, you can use a set
or OrderedSet
instead of a list
in the first answer. There's no OrderedSet
in the stdlib, but this recipe by Raymond Hettinger should be fine for our purposes. But let's do it manually with a set and a list:
result = []
for name, sublist in my_list_of_tuples:
d = {}
for key, subsub in sublist:
if len(subsub) > d.get(key, 0):
d[key] = len(subsub)
lst, seen = [], set()
for key, subsub in sublist:
if len(subsub) == d[key] and tuple(subsub) not in seen:
seen.add(tuple(subsub))
lst.append((key, subsub))
result.append((name, lst))
I think this last one provides exactly the output your updated question asks, and doesn't do anything hard to understand to get there.
Upvotes: 2
Reputation: 3504
First you define a function
def f(ls):
max_length = max(len(y) for (x, y) in ls)
result = []
for (x, y) in ls:
if len(y) == max_length and (x, y) not in result:
result.append((x, y))
return result
Now call it like this
>>> from pprint import pprint
>>> pprint([(name, f(y)) for name, y in my_list_of_tuples])
[('bill', [(4, ['253', '30', '626'])]),
('sarah', [(2, ['2', '6'])]),
('fred', [(1, ['6']), (1, ['2'])])]
Upvotes: 1
Reputation: 71451
You can use max
:
my_list_of_tuples = my_list_of_tuples = [('bill', [(4, ['626']), (4, ['253', '30', '626']), (4, ['253', '30', '626']), (4, ['626']), (4, ['626']), (4, ['626'])]), ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']), (2, ['6']), (2, ['6']), (2, ['6'])]), ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])]
final_result = [(a, [(c, d) for c, d in b if len(d) == max(map(len, [h for _, h in b]))]) for a, b in my_list_of_tuples]
new_result = [(a, [c for i, c in enumerate(b) if c not in b[:i]]) for a, b in final_result]
Output:
[('bill', [(4, ['253', '30', '626'])]), ('sarah', [(2, ['2', '6'])]), ('fred', [(1, ['6']), (1, ['2'])])]
Upvotes: 1