user994165
user994165

Reputation: 9502

itertools.groupby() not grouping correctly

I have this data:

self.data = [(1, 1, 5.0),
             (1, 2, 3.0),
             (1, 3, 4.0),
             (2, 1, 4.0),
             (2, 2, 2.0)]

When I run this code:

for mid, group in itertools.groupby(self.data, key=operator.itemgetter(0)):

for list(group) I get:

[(1, 1, 5.0),
 (1, 2, 3.0),
 (1, 3, 4.0)]

which is what I want.

But if I use 1 instead of 0

for mid, group in itertools.groupby(self.data, key=operator.itemgetter(1)):

to group by the second number in the tuples, I only get:

[(1, 1, 5.0)]

even though there are other tuples that have "1" in that 1 (2nd) position.

Upvotes: 33

Views: 23478

Answers (3)

Shital Shah
Shital Shah

Reputation: 68708

Below "fixes" several annoyances with Python's itertools.groupby.

def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
    if sort:
        l = sorted(l, key=key)
    return ((k, agg((val(x) for x in v))) \
        for k,v in itertools.groupby(l, key=key))

Specifically,

  1. It doesn't require that you sort your data.
  2. It doesn't require that you must use key as named parameter only.
  3. The output is clean generator of tuple(key, grouped_values) where values are specified by 3rd parameter.
  4. Ability to apply aggregation functions like sum or avg easily.

Example Usage

import itertools
from operator import itemgetter
from statistics import *

t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
  print(k, v)

This prints,

a 4
b 2

Play with this code

Upvotes: 1

Kostia R
Kostia R

Reputation: 2565

Variant without sorting (via dictionary). Should be better performance-wise.

def full_group_by(l, key=lambda x: x):
    d = defaultdict(list)
    for item in l:
        d[key(item)].append(item)
    return d.items()

Upvotes: 38

unutbu
unutbu

Reputation: 879143

itertools.groupby collects together contiguous items with the same key. If you want all items with the same key, you have to sort self.data first.

for mid, group in itertools.groupby(
    sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):

Upvotes: 64

Related Questions