Jephron
Jephron

Reputation: 2758

Difference between iterating through a generator and converting to a list

I would have expected these two pieces of code to produce the same results

from itertools import groupby

for i in list(groupby('aaaabb')):
    print i[0], list(i[1])

for i, j in groupby('aaaabb'):
    print i, list(j)

In one I convert the iterator returned by groupby to a list and iterate over that, and in the other I iterate over the returned iterator directly.

The output of this script is

a []
b ['b']


a ['a', 'a', 'a', 'a']
b ['b', 'b']

Why is this the case?

Edit: for reference, the result of groupby('aabbaa') looks like

('a', <itertools._grouper object at 0x10c1324d0>)
('b', <itertools._grouper object at 0x10c132250>)

Upvotes: 3

Views: 76

Answers (1)

Dietrich Epp
Dietrich Epp

Reputation: 213807

This is a quirk of the groupby function, presumably for performance.

From the itertools.groupby documentation:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:

groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
    groups.append(list(g))      # Store group iterator as a list
    uniquekeys.append(k)

So, you can do this:

for i in [x, list(y) for x, y in groupby('aabbaa')]:
    print i[0], i[1]

Upvotes: 5

Related Questions