Reputation: 53
I am a newbie to python. when reading the python standard library reference, I got confused by the grouper() example in itertools recipes section.
I tried to put the sample codes in a small program like below:
from itertools import zip_longest
import copy
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
args = [iter(iterable)] * n
# print each string in args
#c = copy.deepcopy(args)
#for a in c:
# print(list(a))
return zip_longest(*args, fillvalue=fillvalue)
def main():
print("this is our first test script file")
g = grouper('ABCDEFG', 3, 'x')
# print each string in results
#for s in g:
# print(list(s))
main()
If we remove the comment tags, it would produce the results like below:
['A', 'B', 'C', 'D', 'E', 'F', 'G']
[]
[]
['A', 'B', 'C']
['D', 'E', 'F']
['G', 'x', 'x']
This doesn't look right to me, because the results of the args variable is:
['A', 'B', 'C', 'D', 'E', 'F', 'G']
[]
[]
how could the zip_longest() call produce results like below?
['A', 'B', 'C']
['D', 'E', 'F']
['G', 'x', 'x']
It should be A,B,C,D,... because the second and third list in args are empty. Or did I miss something?
Can anyone explain it to me?
Upvotes: 1
Views: 3147
Reputation: 16623
zip
and zip_longest
are quite different from deepcopy
when it comes to how they consume their arguments.
grouper
works because zip
and zip_longest
take one element at a time from each argument. For example, consider this:
i1 = i2 = i3 = iter([1, 2, 3, 4, 5, 6])
zip(i1, i2, i3)
Because i1
, i2
, and i3
share the same iterator, advancing one also advances the others. zip
does this:
i1
.i2
.i3
.For the example, something like this happens:
First iteration:
i1
. => 1
i2
. => 2
i3
. => 3
(1, 2, 3)
Second iteration:
i1
. => 4
i2
. => 5
i3
. => 6
(4, 5, 6)
Now, deepcopy
, in this case, only copies the iterators. It doesn't consume them in any way. Your for
loop does however consume them:
i1
. => 1, 2, 3, 4, 5, 6, StopIteration raised
i2
. => StopIteration raised
i3
. => StopIteration raised
Therefore, you get the result that you see.
Upvotes: 2
Reputation: 20500
The normal zip function takes the shortest iterator and only zips together those values, if another list is longer, it just ignores those values.
Below you can see that the second list was of length 4, but it ignores the last element
From the docs: https://docs.python.org/3.3/library/functions.html#zip
Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted.
print(list(zip([1,2,3],['A','B','C','D'])))
#[(1, 'A'), (2, 'B'), (3, 'C')]
Whereas zip_longest takes the longest iterator.
Below you can see that the second list was of length 4, but zip_longest
did not ignore it.
From the docs: https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest
Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted. Equivalent to:
import itertools as it
print(list(it.zip_longest([1,2,3],['A','B','C','D'])))
#[(1, 'A'), (2, 'B'), (3, 'C'), (None, 'D')]
The fillvalue
argument fills the missing values with a default value. for e.g. below I have fillvalue='X'
import itertools as it
print(list(it.zip_longest([1,2,3],['A','B','C','D'], fillvalue='X')))
#[(1, 'A'), (2, 'B'), (3, 'C'), ('X', 'D')]
Upvotes: 0