Reputation: 43
Consider this code snippet:
>>> from itertools import chain
>>> foo = [0]
>>> for i in (1, 2):
... bar = (range(a, a+i) for a in foo)
... foo = chain(*list(bar))
...
>>> list(foo)
[0, 1]
This makes sense - in the first iteration of the loop, bar
is equivalent to iter([[0]])
and foo evaluates to chain([0])
, which is equivalent to iter([0])
. Then, in the second iteration of the loop, bar
is now equivalent to iter([[0, 1]])
and foo becomes iter([0, 1])
. That's why list(foo)
is [0, 1]
.
I also get the same result for list(foo)
when I use foo = sum(list(bar), [])
rather than of chain(*list(bar))
.
Now consider this code snippet:
>>> from itertools import chain
>>> foo = [0]
>>> for i in (1, 2):
... bar = (range(a, a+i) for a in foo)
... foo = chain.from_iterable(bar)
...
>>> list(foo)
[0, 1, 1, 2]
As you can see, the only difference is the foo = chain.from_iterable(bar)
line, that uses itertools.chain.from_iterable
rather than itertools.chain
.
It seems to me that itertools.chain(*list(iterable))
is roughly equivalent to itertools.chain.from_iterable(iterable)
, however it's not the case here. So why is the final result different?
Upvotes: 4
Views: 945
Reputation: 2189
The difference is the use of generators and the delayed evaluation of foo = chain.from_iterable(bar)
. The two programs would be equivalent if you changed this line to be foo = chain.from_iterable(list(bar))
, which forces the evaluation of the bar generator to ground foo in concrete values.
Otherwise, as written, the two programs are semantically different in that the former applies chain to a list while the second applies chain to a generator which can be thought of as a function handle in some respects, which defers execution until the final list(foo)
is called after the loop finishes.
[This answer was tested in Python 3, where range is a generator. It may behave differently in Python 2.x where range returns the entire list...]
Upvotes: 1
Reputation: 353149
The difference is that in chain(*list(bar))
, bar is exhausted immediately, whereas in chain.from_iterable(bar)
, it's not. And in the definition of bar
, i
is used, which is late-binding: it picks up the value of i
not at the time of definition, but from the name i
at the time it's evaluated.
IOW, when you use foo = chain.from_iterable(bar)
, bar
is not evaluated yet. When you then call list(foo)
, and it "calls" bar
, the i
in the definition picks up the value that the name i
currently refers to -- which is 2.
So if we change i
manually, we should be able to change the result appropriately:
>>> from itertools import chain
>>> foo = [0]
>>> for i in (1, 2):
... bar = (range(a, a+i) for a in foo)
... foo = chain.from_iterable(bar)
...
>>> i = 10
>>> list(foo)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
Upvotes: 6