Reputation: 5345
I have list like this:
[['Richard', 1, 'Group A'], ['Mark', 3, 'Group A'],
['Alan', 4, 'Group B'], ['Dave', 3, 'Group B'],
['Gordon', 2, 'Group A']]
That I would like to filter so that only the lowest number (Richard has a number of 1, Mark is 3, Alan is 4, etc.) of each group is kept so that the list would look like:
[['Richard', 1, 'Group A'], ['Dave', 3, 'Group B']]
I'm sorting with the lambda key:
filteredList = sorted(list,key=lambda x: x[2])
But I'm blocked when it comes to sorting within each group and getting rid of higher ranking individuals.
Is there a simple way to achieve this in Python or should I iterate and test each row?
Upvotes: 2
Views: 185
Reputation: 310049
This is a simple "bin and find min" problem. The first pass, we'll bin:
from collections import defaultdict
bins = defaultdict(list)
for item in input_list:
bins[item[2]].append(item)
Now we just need to take the minimum of each of the bins:
from operator import itemgetter
get_second = itemgetter(1)
results = [min(group, key=get_second) for group in bins.values()]
Up to this point, we have an O(N) algorithm (binning happens in O(1) time for each of the N
items we put in the dict) and finding the min
runs over each of the items exactly once more -- so that's O(N) too...
If necessary, you can then sort the results by group name:
results.sort(key=itemgetter(2))
We can do the min
step and binning step at the same time to save a bit of memory (e.g. if the input is coming from a generator and has lots of items):
from operator import itemgetter
get_second = itemgetter(1)
results = {}
for item in input_stream:
group = item[2]
if group not in results:
results[group] = item
else:
results[group] = min(item, results[group], key=get_second)
This is effectively a different implementation of the same idea as the solution provided by @wim. To order the results when you're done (if necessary):
ordered_results = sorted(results.values(), key=itmegetter(2))
In this way, we only keep one result for each group around. The cost is a bit of extra code complexity.
Upvotes: 3
Reputation: 107337
You can use collections.defaultdict
in order to group your sub lists based on 3rd item then use min()
function with a proper key within a list comprehension in order to get the expected result:
In [61]: from operator import itemgetter
In [62]: from collections import defaultdict
In [63]: lst = [['Richard', 1, 'Group A'], ['Mark', 3, 'Group A'], ['Alan', 4, 'Group B'], ['Dave', 3, 'Group B'], ['Gordon', 2, 'Group A']]
In [64]: d = defaultdict(list)
In [65]: for i, j, k in lst:
d[k].append([i, j, k])
....:
In [66]: [min(sub, key=itemgetter(1)) for sub in d.values()]
Out[66]: [['Dave', 3, 'Group B'], ['Richard', 1, 'Group A']]
You can do this even in a more optimized manner by passing a customized object to defaultdict()
, so that it only appends new items if they have a smaller second item:
from collections import defaultdict
class MyList(list):
def __init__(self, *args, **kwargs):
super(MyList, self).__init__(*args, **kwargs)
def special_append(self, arg):
if not self:
self.append(arg)
elif arg[1] < self[0][1]:
self[0] = arg
Demo:
lst = [['Richard', 1, 'Group A'], ['Mark', 3, 'Group A'], ['Alan', 4, 'Group B'], ['Dave', 3, 'Group B'], ['Gordon', 2, 'Group A']]
d = defaultdict(MyList)
for i, j, k in lst:
d[k].special_append([i, j, k])
print(d)
defaultdict(<class '__main__.MyList'>, {'Group B': [['Dave', 3, 'Group B']], 'Group A': [['Richard', 1, 'Group A']]})
Upvotes: 3
Reputation: 363053
Re-key the data off of the group name. Don't name your data list
because it shadows a built-in name.
>>> results = {}
>>> for name, number, group in data:
... key = group
... value = number, name
... results[key] = min(value, results.get(key, value))
...
>>> [[name, number, group] for group, (number, name) in results.items()]
[['Dave', 3, 'Group B'], ['Richard', 1, 'Group A']]
Pure python data structures handle this problem quite nicely, the sort/itertools approach is suboptimal and increases complexity from O(n) to O(n logn).
Upvotes: 4
Reputation: 54223
I agree with TemporalWolf in the comments that itertools.groupby
is the right approach to take.
from itertools import groupby
from operator import itemgetter
in_ = [['Richard', 1, 'Group A'], ['Mark', 3, 'Group A'],
['Alan', 4, 'Group B'], ['Dave', 3, 'Group B'],
['Gordon', 2, 'Group A']]
groups = groupby(in_, key=itemgetter(2))
# operator.itemgetter(N) is equivalent to lambda x: x[N]
The groupby function creates something akin to:
[("Group A", [['Richard', 1, 'Group A'],
['Mark', 3, 'Group A'],
['Gordon', 2, 'Group A']]),
("Group B", [['Alan', 4, 'Group B'],
['Dave', 3, 'Group B']])]
Then it's easy enough to iterate and use min
to find the results
minimums = []
for _, vals in groups:
minimums.append(min(vals, key=itemgetter(1)))
Upvotes: 0
Reputation: 29089
You could make it something like key=lambda x: (x[2], x[1])
. Then you have two-level sorting.
Alternatively, operator.itemgetter
can take multiple indices.
Upvotes: 0