BradW
BradW

Reputation: 53

grouper() example in itertools

I am a newbie to python. when reading the python standard library reference, I got confused by the grouper() example in itertools recipes section.

I tried to put the sample codes in a small program like below:

from itertools import zip_longest
import copy

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    args = [iter(iterable)] * n
    # print each string in args
    #c = copy.deepcopy(args)
    #for a in c:
    #    print(list(a))
    return zip_longest(*args, fillvalue=fillvalue)

def main():
    print("this is our first test script file")
    g = grouper('ABCDEFG', 3, 'x')
    # print each string in results
    #for s in g:
    #    print(list(s))

main()

If we remove the comment tags, it would produce the results like below:

['A', 'B', 'C', 'D', 'E', 'F', 'G']
[]
[]
['A', 'B', 'C']
['D', 'E', 'F']
['G', 'x', 'x']

This doesn't look right to me, because the results of the args variable is:

['A', 'B', 'C', 'D', 'E', 'F', 'G']
[]
[]

how could the zip_longest() call produce results like below?

['A', 'B', 'C']
['D', 'E', 'F']
['G', 'x', 'x']

It should be A,B,C,D,... because the second and third list in args are empty. Or did I miss something?

Can anyone explain it to me?

Upvotes: 1

Views: 3147

Answers (2)

iz_
iz_

Reputation: 16623

zip and zip_longest are quite different from deepcopy when it comes to how they consume their arguments.

grouper works because zip and zip_longest take one element at a time from each argument. For example, consider this:

i1 = i2 = i3 = iter([1, 2, 3, 4, 5, 6])
zip(i1, i2, i3)

Because i1, i2, and i3 share the same iterator, advancing one also advances the others. zip does this:

  1. Take an element from i1.
  2. Take an element from i2.
  3. Take an element from i3.
  4. Yield a tuple of these elements.
  5. Repeat from step 1.

For the example, something like this happens:

First iteration:

  1. Take an element from i1. => 1
  2. Take an element from i2. => 2
  3. Take an element from i3. => 3
  4. Yield a tuple of these elements => (1, 2, 3)

Second iteration:

  1. Take an element from i1. => 4
  2. Take an element from i2. => 5
  3. Take an element from i3. => 6
  4. Yield a tuple of these elements => (4, 5, 6)

Now, deepcopy, in this case, only copies the iterators. It doesn't consume them in any way. Your for loop does however consume them:

  1. Take everything from i1. => 1, 2, 3, 4, 5, 6, StopIteration raised
  2. Take everything from i2. => StopIteration raised
  3. Take everything from i3. => StopIteration raised

Therefore, you get the result that you see.

Upvotes: 2

Devesh Kumar Singh
Devesh Kumar Singh

Reputation: 20500

The normal zip function takes the shortest iterator and only zips together those values, if another list is longer, it just ignores those values.
Below you can see that the second list was of length 4, but it ignores the last element From the docs: https://docs.python.org/3.3/library/functions.html#zip

Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted.

print(list(zip([1,2,3],['A','B','C','D'])))
#[(1, 'A'), (2, 'B'), (3, 'C')]

Whereas zip_longest takes the longest iterator. Below you can see that the second list was of length 4, but zip_longest did not ignore it.
From the docs: https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest

Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted. Equivalent to:

import itertools as it
print(list(it.zip_longest([1,2,3],['A','B','C','D'])))
#[(1, 'A'), (2, 'B'), (3, 'C'), (None, 'D')]

The fillvalue argument fills the missing values with a default value. for e.g. below I have fillvalue='X'

import itertools as it
print(list(it.zip_longest([1,2,3],['A','B','C','D'], fillvalue='X')))
#[(1, 'A'), (2, 'B'), (3, 'C'), ('X', 'D')]

Upvotes: 0

Related Questions