Avnerium
Avnerium

Reputation: 43

Weird behavior of itertools.chain.from_iterable

Consider this code snippet:

>>> from itertools import chain
>>> foo = [0]
>>> for i in (1, 2):
...     bar = (range(a, a+i) for a in foo)
...     foo = chain(*list(bar))
...
>>> list(foo)
[0, 1]

This makes sense - in the first iteration of the loop, bar is equivalent to iter([[0]]) and foo evaluates to chain([0]), which is equivalent to iter([0]). Then, in the second iteration of the loop, bar is now equivalent to iter([[0, 1]]) and foo becomes iter([0, 1]). That's why list(foo) is [0, 1].

I also get the same result for list(foo) when I use foo = sum(list(bar), []) rather than of chain(*list(bar)).

Now consider this code snippet:

>>> from itertools import chain
>>> foo = [0]
>>> for i in (1, 2):
...     bar = (range(a, a+i) for a in foo)
...     foo = chain.from_iterable(bar)
...
>>> list(foo)
[0, 1, 1, 2]

As you can see, the only difference is the foo = chain.from_iterable(bar) line, that uses itertools.chain.from_iterable rather than itertools.chain.

It seems to me that itertools.chain(*list(iterable)) is roughly equivalent to itertools.chain.from_iterable(iterable), however it's not the case here. So why is the final result different?

Upvotes: 4

Views: 945

Answers (2)

svohara
svohara

Reputation: 2189

The difference is the use of generators and the delayed evaluation of foo = chain.from_iterable(bar). The two programs would be equivalent if you changed this line to be foo = chain.from_iterable(list(bar)), which forces the evaluation of the bar generator to ground foo in concrete values.

Otherwise, as written, the two programs are semantically different in that the former applies chain to a list while the second applies chain to a generator which can be thought of as a function handle in some respects, which defers execution until the final list(foo) is called after the loop finishes.

[This answer was tested in Python 3, where range is a generator. It may behave differently in Python 2.x where range returns the entire list...]

Upvotes: 1

DSM
DSM

Reputation: 353149

The difference is that in chain(*list(bar)), bar is exhausted immediately, whereas in chain.from_iterable(bar), it's not. And in the definition of bar, i is used, which is late-binding: it picks up the value of i not at the time of definition, but from the name i at the time it's evaluated.

IOW, when you use foo = chain.from_iterable(bar), bar is not evaluated yet. When you then call list(foo), and it "calls" bar, the i in the definition picks up the value that the name i currently refers to -- which is 2.

So if we change i manually, we should be able to change the result appropriately:

>>> from itertools import chain
>>> foo = [0]
>>> for i in (1, 2):
...         bar = (range(a, a+i) for a in foo)
...         foo = chain.from_iterable(bar)
...     
>>> i = 10
>>> list(foo)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Upvotes: 6

Related Questions