Reputation: 9502
I have this data:
self.data = [(1, 1, 5.0),
(1, 2, 3.0),
(1, 3, 4.0),
(2, 1, 4.0),
(2, 2, 2.0)]
When I run this code:
for mid, group in itertools.groupby(self.data, key=operator.itemgetter(0)):
for list(group)
I get:
[(1, 1, 5.0),
(1, 2, 3.0),
(1, 3, 4.0)]
which is what I want.
But if I use 1 instead of 0
for mid, group in itertools.groupby(self.data, key=operator.itemgetter(1)):
to group by the second number in the tuples, I only get:
[(1, 1, 5.0)]
even though there are other tuples that have "1" in that 1 (2nd) position.
Upvotes: 33
Views: 23478
Reputation: 68708
Below "fixes" several annoyances with Python's itertools.groupby
.
def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
if sort:
l = sorted(l, key=key)
return ((k, agg((val(x) for x in v))) \
for k,v in itertools.groupby(l, key=key))
Specifically,
key
as named parameter only.tuple(key, grouped_values)
where values are specified by 3rd parameter.Example Usage
import itertools
from operator import itemgetter
from statistics import *
t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
print(k, v)
This prints,
a 4
b 2
Upvotes: 1
Reputation: 2565
Variant without sorting (via dictionary). Should be better performance-wise.
def full_group_by(l, key=lambda x: x):
d = defaultdict(list)
for item in l:
d[key(item)].append(item)
return d.items()
Upvotes: 38
Reputation: 879143
itertools.groupby collects together contiguous items with the same key.
If you want all items with the same key, you have to sort self.data
first.
for mid, group in itertools.groupby(
sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):
Upvotes: 64