Reputation: 4133
Input is the list of tuples: (string, list of integers)
I need to select only tuples with unique list of integers (order matters) that have the shortest string value.
For example:
[('s', [1,2]), ('ss', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])] -> [('s', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])]
('s', [1,2]), ('ss', [1,2])
have the same list [1,2]
. Select tuple with shortest string value. In this case ('s', [1,2])
What is the right way to group and filter (speed is important)?
In other words: group by list and take the shortest string.
Upvotes: 0
Views: 111
Reputation: 593
One option is to use the itertools groupby
function.
Assuming your input list is:
unfiltered = [('s', [1,2]), ('ss', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])]
This should give you what you're looking for:
from itertools import groupby
# group by value
key_func = lambda x: x[1]
# sort the input by the grouping criteria
# to conform with groupby's behavior - see comments
unfiltered_sorted = sorted(unfiltered, key=key_func)
# get one item per group. sorting the group by length
# instead of the default alphabetical order
filtered = [sorted(group, key=len)[0] for _, group in groupby(unfiltered_sorted, key_func)]
Upvotes: 2
Reputation: 604
try this
lst = [('s', [1,2]), ('ss', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])]
def sort_lst(x):
return len(x[0])
lst.sort(key=sort_lst)
result = [i for e,i in enumerate(lst) if i[1] not in [j[1] for j in lst[:e]]]
print(result)
Upvotes: 1
Reputation: 3523
Does order matter?
If not:
input_ = [('s', [1, 2]), ('ss', [1, 2]), ('ss', [1, 2, 3]), ('ss', [1, 3, 2])]
expected = [('s', [1, 2]), ('ss', [1, 2, 3]), ('ss', [1, 3, 2])]
# sort by string length, then list length in reverse.
sorted_input = sorted(input_, key=lambda a: (len(a[0]), len(a[1])), reverse=True)
# then put that in dict, triggering it to overwrite the previous key.
temp_dict = {}
for (string, list_) in sorted_input:
temp_dict[tuple(list_)] = string
# now convert back
output = [(string, list(list_)) for list_, string in reversed(temp_dict.items())]
print(f"output: {output} \nExpect: {expected}")
output: [('s', [1, 2]), ('ss', [1, 3, 2]), ('ss', [1, 2, 3])]
Expect: [('s', [1, 2]), ('ss', [1, 2, 3]), ('ss', [1, 3, 2])]
Basically trick is:
You could save more time by removing tuple->list conversions at output
.
To add, python's dict.items()
cannot be reversed in <= python 3.7. in that case, reverse list(dict.items())
.
Upvotes: 1