Reputation: 1482
While experimenting with functional programming in python i have noticed a difference between two expression I believe should have the same results.
In particular what I want to to is to have an iterable which consists of(or should I say yields?) other iterable's. A simple example of what I want to do could be:
import itertools as itr
itr.repeat(itr.repeat(1,5),3)
That is an iterable consisting of 3 iterables, which themselves consits of 5 occourences of the number 1. This is however not what happens. What i get instead(translated to lists) is:
[[1,1,1,1,1],[],[]]
That is, the innermost iterable is not copied(it seems) instead the same iterable is used again and again, resulting in it running out of elements.
A version of this that does works using maps is:
import itertools as itr
map(lambda x: itr.repeat(1,5), range(3))
This produces the result I expect:
[[1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1]]
I don't understand why this works, while the method using only repeat does not. Maybe it has something to do with the fact that in the map version, the iterable coming from repeat
is wrapped in a lambda, but should that make a difference? As far as I see it, the only difference between lambda x: itr.repeat(1,5)
and itr.repeat(1,5)
is that the first one takes an argument (which it then throws away) while the other one does not.
Upvotes: 1
Views: 463
Reputation: 71535
Your intuition is correct, the problem is repeat
giving you a generator that keeps yielding the same object, not a copy of the object. Generator objects can only be iterated once; every time the next item in the iteration is yielded it is permanently discarded from the generator.
The difference between lambda x: itr.repeat(1,5)
and itr.repeat(1,5)
is the difference between code and data. When you pass the "bare" repeat
call it has already executed and returned a generator object and it is the generator object which is passed; when you pass the lambda then the itr.repeat(1,5)
is code within a function that has not been executed yet, and it's the function that is passed. When the lambda is called then the repeat
call is evaluated and returns a generator, and this happens each time the lambda is called, so you get a new generator every time.
Since map
calls its argument function for each element of the collection rather than calling it once to get an object and then using that object every time, you get separate independent generator objects. Since repeat
just repeatedly yields the object you gave it originally, you get multiple references to a single generator object.
This is basically the same distinction as between these two snippets:
a = itr.repeat(1, 5)
b = itr.repeat(1, 5)
and
a = itr.repeat(1, 5)
b = a
If you call repeat
once and then pass around the resulting object, there is only one generator and consuming it from any of the places you have passed it will be visible from all of those places. If you call repeat
multiple times then you have multiple independent generators.
Upvotes: 0
Reputation: 279325
The difference is that itertools.repeat
takes an object as its first argument, and when iterated it yields that same object multiple times. In this case, that object can only be iterated once before it is exhausted, hence the result you see.
map
takes a callable object as its first argument, and it calls that object multiple times, each time yielding the result.
So, in your first code snippet there is only ever one object generating 1
5 times. In your second snippet, there is one lambda object, but each time it's called it creates a new generator object generating 1
5 times.
To get what you want you would normally write either:
(itr.repeat(1,5) for _ in range(3))
to get multiple 1
5 times generators, or:
itr.repeat(tuple(itr.repeat(1,5)),3)
since a tuple, unlike the return from itr.repeat
, can be iterated repeatedly.
Or of course, since this example is small you could forget about generators and just write:
((1,)*5,)*3
which is concise but a bit obscure.
Your problem is similar to the difference between the following:
# there is only one inner list
foo = [[]] * 3
foo[0].append(0)
foo
# [[0], [0], [0]]
# there are three separate inner lists
bar = [[] for _ in range(3)]
bar[0].append(0)
bar
# [[0], [], []]
Upvotes: 2
Reputation: 1123500
The itertools
library produces generators, and generators can only be iterated over once. Then they are simply exhausted and will not produce their values again:
>>> import itertools as itr
>>> repeater = itr.repeat(1, 5)
>>> list(repeater)
[1, 1, 1, 1, 1]
>>> list(repeater)
[]
The map()
version, on the other hand, produces new generator objects. You could have used a list comprehension too:
[itr.repeat(1, 5) for _ in range(3)]
Now each object in that list is a separate generator object, that can be iterated over independently. You can test that the objects are different:
>>> repeaters = map(lambda x: itr.repeat(1,5), range(3))
>>> for pair in itr.combinations(repeaters, 2):
... print id(pair[0]), id(pair[1]), 'pair[0] is pair[1]', pair[0] is pair[1]
...
4557097936 4557097808 pair[0] is pair[1] False
4557097936 4557105040 pair[0] is pair[1] False
4557097808 4557105040 pair[0] is pair[1] False
Upvotes: 2
Reputation: 184280
You're repeating an iterator three times. After the first time, it's exhausted, so the second and third iterations over it don't do anything; it's already at the end.
Using map()
you create three separate iterator objects (via the call to the lambda), so this doesn't happen.
Upvotes: 2
Reputation: 208555
As you noted, itertools.repeat()
does not copy the item it is repeated, instead the same iterable is used each time.
The map()
method works because the lambda function is called separately for each element in range(3)
, so you get three separate itertools.repeat(1, 5)
iterables to generate the nested contents.
To do this entirely with itertools you would use itertools.tee
:
import itertools as itr
itr.tee(itr.repeat(1, 5), 3)
Here is an example showing the result as lists:
>>> [list(x) for x in itr.tee(itr.repeat(1, 5), 3)]
[[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]
Upvotes: 3