Oleg Dats
Oleg Dats

Reputation: 4133

How do I group and filter the next list of objects?

Input is the list of tuples: (string, list of integers)

I need to select only tuples with unique list of integers (order matters) that have the shortest string value.

For example:

[('s', [1,2]), ('ss', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])] -> [('s', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])]

('s', [1,2]), ('ss', [1,2]) have the same list [1,2] . Select tuple with shortest string value. In this case ('s', [1,2])

What is the right way to group and filter (speed is important)?

In other words: group by list and take the shortest string.

Upvotes: 0

Views: 111

Answers (3)

Shlomo Gottlieb
Shlomo Gottlieb

Reputation: 593

One option is to use the itertools groupby function.

Assuming your input list is:

unfiltered = [('s', [1,2]), ('ss', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])]

This should give you what you're looking for:

from itertools import groupby

# group by value
key_func = lambda x: x[1]

# sort the input by the grouping criteria
# to conform with groupby's behavior - see comments
unfiltered_sorted = sorted(unfiltered, key=key_func)

# get one item per group. sorting the group by length
# instead of the default alphabetical order
filtered = [sorted(group, key=len)[0] for _, group in groupby(unfiltered_sorted, key_func)]

Upvotes: 2

SimoN SavioR
SimoN SavioR

Reputation: 604

try this

lst = [('s', [1,2]), ('ss', [1,2]), ('ss', [1,2,3]), ('ss', [1,3,2])]

def sort_lst(x):
    return len(x[0])

lst.sort(key=sort_lst)


result = [i for e,i in enumerate(lst) if i[1] not in [j[1] for j in lst[:e]]]
print(result)

Upvotes: 1

jupiterbjy
jupiterbjy

Reputation: 3523

Does order matter?

If not:

input_ = [('s', [1, 2]), ('ss', [1, 2]), ('ss', [1, 2, 3]), ('ss', [1, 3, 2])]
expected = [('s', [1, 2]), ('ss', [1, 2, 3]), ('ss', [1, 3, 2])]


# sort by string length, then list length in reverse.
sorted_input = sorted(input_, key=lambda a: (len(a[0]), len(a[1])), reverse=True)

# then put that in dict, triggering it to overwrite the previous key.
temp_dict = {}

for (string, list_) in sorted_input:
    temp_dict[tuple(list_)] = string

# now convert back
output = [(string, list(list_)) for list_, string in reversed(temp_dict.items())]

print(f"output: {output} \nExpect: {expected}")
output: [('s', [1, 2]), ('ss', [1, 3, 2]), ('ss', [1, 2, 3])] 
Expect: [('s', [1, 2]), ('ss', [1, 2, 3]), ('ss', [1, 3, 2])]

Basically trick is:

  1. Order by length-descending order.
  2. Put it into dict. Convert list to tuple, then use it as key, and string as value.
  3. As it's inserted into dict, if there's overlapping key(list), then it's overwritten by shorter strings as we sorted in descending order before.

You could save more time by removing tuple->list conversions at output.


To add, python's dict.items() cannot be reversed in <= python 3.7. in that case, reverse list(dict.items()).

Upvotes: 1

Related Questions