Andreas Vinter-Hviid
Andreas Vinter-Hviid

Reputation: 1482

Generating iterables of iterables with python itertools. (using the repeat function)

While experimenting with functional programming in python i have noticed a difference between two expression I believe should have the same results.

In particular what I want to to is to have an iterable which consists of(or should I say yields?) other iterable's. A simple example of what I want to do could be:

import itertools as itr
itr.repeat(itr.repeat(1,5),3)

That is an iterable consisting of 3 iterables, which themselves consits of 5 occourences of the number 1. This is however not what happens. What i get instead(translated to lists) is:

[[1,1,1,1,1],[],[]]

That is, the innermost iterable is not copied(it seems) instead the same iterable is used again and again, resulting in it running out of elements.

A version of this that does works using maps is:

import itertools as itr
map(lambda x: itr.repeat(1,5), range(3))

This produces the result I expect:

[[1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1]]

I don't understand why this works, while the method using only repeat does not. Maybe it has something to do with the fact that in the map version, the iterable coming from repeat is wrapped in a lambda, but should that make a difference? As far as I see it, the only difference between lambda x: itr.repeat(1,5) and itr.repeat(1,5) is that the first one takes an argument (which it then throws away) while the other one does not.

Upvotes: 1

Views: 463

Answers (5)

Ben
Ben

Reputation: 71535

Your intuition is correct, the problem is repeat giving you a generator that keeps yielding the same object, not a copy of the object. Generator objects can only be iterated once; every time the next item in the iteration is yielded it is permanently discarded from the generator.

The difference between lambda x: itr.repeat(1,5) and itr.repeat(1,5) is the difference between code and data. When you pass the "bare" repeat call it has already executed and returned a generator object and it is the generator object which is passed; when you pass the lambda then the itr.repeat(1,5) is code within a function that has not been executed yet, and it's the function that is passed. When the lambda is called then the repeat call is evaluated and returns a generator, and this happens each time the lambda is called, so you get a new generator every time.

Since map calls its argument function for each element of the collection rather than calling it once to get an object and then using that object every time, you get separate independent generator objects. Since repeat just repeatedly yields the object you gave it originally, you get multiple references to a single generator object.

This is basically the same distinction as between these two snippets:

a = itr.repeat(1, 5)
b = itr.repeat(1, 5)

and

a = itr.repeat(1, 5)
b = a

If you call repeat once and then pass around the resulting object, there is only one generator and consuming it from any of the places you have passed it will be visible from all of those places. If you call repeat multiple times then you have multiple independent generators.

Upvotes: 0

Steve Jessop
Steve Jessop

Reputation: 279325

The difference is that itertools.repeat takes an object as its first argument, and when iterated it yields that same object multiple times. In this case, that object can only be iterated once before it is exhausted, hence the result you see.

map takes a callable object as its first argument, and it calls that object multiple times, each time yielding the result.

So, in your first code snippet there is only ever one object generating 1 5 times. In your second snippet, there is one lambda object, but each time it's called it creates a new generator object generating 1 5 times.

To get what you want you would normally write either:

(itr.repeat(1,5) for _ in range(3))

to get multiple 1 5 times generators, or:

itr.repeat(tuple(itr.repeat(1,5)),3)

since a tuple, unlike the return from itr.repeat, can be iterated repeatedly.

Or of course, since this example is small you could forget about generators and just write:

((1,)*5,)*3

which is concise but a bit obscure.

Your problem is similar to the difference between the following:

# there is only one inner list
foo = [[]] * 3
foo[0].append(0)
foo
# [[0], [0], [0]]

# there are three separate inner lists
bar = [[] for _ in range(3)]
bar[0].append(0)
bar
# [[0], [], []]

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1123500

The itertools library produces generators, and generators can only be iterated over once. Then they are simply exhausted and will not produce their values again:

>>> import itertools as itr
>>> repeater = itr.repeat(1, 5)
>>> list(repeater)
[1, 1, 1, 1, 1]
>>> list(repeater)
[]

The map() version, on the other hand, produces new generator objects. You could have used a list comprehension too:

[itr.repeat(1, 5) for _ in range(3)]

Now each object in that list is a separate generator object, that can be iterated over independently. You can test that the objects are different:

>>> repeaters = map(lambda x: itr.repeat(1,5), range(3))
>>> for pair in itr.combinations(repeaters, 2):
...     print id(pair[0]), id(pair[1]), 'pair[0] is pair[1]', pair[0] is pair[1]
... 
4557097936 4557097808 pair[0] is pair[1] False
4557097936 4557105040 pair[0] is pair[1] False
4557097808 4557105040 pair[0] is pair[1] False

Upvotes: 2

kindall
kindall

Reputation: 184280

You're repeating an iterator three times. After the first time, it's exhausted, so the second and third iterations over it don't do anything; it's already at the end.

Using map() you create three separate iterator objects (via the call to the lambda), so this doesn't happen.

Upvotes: 2

Andrew Clark
Andrew Clark

Reputation: 208555

As you noted, itertools.repeat() does not copy the item it is repeated, instead the same iterable is used each time.

The map() method works because the lambda function is called separately for each element in range(3), so you get three separate itertools.repeat(1, 5) iterables to generate the nested contents.

To do this entirely with itertools you would use itertools.tee:

import itertools as itr
itr.tee(itr.repeat(1, 5), 3)

Here is an example showing the result as lists:

>>> [list(x) for x in itr.tee(itr.repeat(1, 5), 3)]
[[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]

Upvotes: 3

Related Questions