Reputation: 88168
From a previous question I learned something interesting. If Python's itertools.product
is fed a series of iterators, these iterators will be converted into tuples before the Cartesian product begins. Related questions look at the source code of itertools.product
to conclude that, while no intermediate results are stored in memory, tuple versions of the original iterators are created before the product iteration begins.
Question: Is there a way to create an iterator to a Cartesian product when the (tuple converted) inputs are too large to hold in memory? Trivial example:
import itertools
A = itertools.permutations(xrange(100))
itertools.product(A)
A more practical use case would take in a series of (*iterables[, repeat])
like the original implementation of the function - the above is just an example. It doesn't look like you can use the current implementation of itertools.product
, so I welcome in submission in pure python (though you can't beat the C backend of itertools
!).
Upvotes: 17
Views: 5488
Reputation: 18
I'm sorry to up this topic but after spending hours debugging a program trying to iterate over recursively generated cartesian product of generators. I can tell you that none of the solutions above work if not working with constant numbers as in all the examples above.
Correction :
from itertools import tee
def product(*iterables, **kwargs):
if len(iterables) == 0:
yield ()
else:
iterables = iterables * kwargs.get('repeat', 1)
it = iterables[0]
for item in it() if callable(it) else iter(it):
iterables_tee = list(map(tee, iterables[1:]))
iterables[1:] = [it1 for it1, it2 in iterables_tee]
iterable_copy = [it2 for it1, it2 in iterables_tee]
for items in product(*iterable_copy):
yield (item, ) + items
If your generators contain generators, you need to pass a copy to the recursive call.
Upvotes: 0
Reputation: 151027
This can't be done with standard Python generators, because some of the iterables must be cycled through multiple times. You have to use some kind of datatype capable of "reiteration." I've created a simple "reiterable" class and a non-recursive product algorithm. product
should have more error-checking, but this is at least a first approach. The simple reiterable class...
class PermutationsReiterable(object):
def __init__(self, value):
self.value = value
def __iter__(self):
return itertools.permutations(xrange(self.value))
And product
iteslf...
def product(*reiterables, **kwargs):
if not reiterables:
yield ()
return
reiterables *= kwargs.get('repeat', 1)
iterables = [iter(ri) for ri in reiterables]
try:
states = [next(it) for it in iterables]
except StopIteration:
# outer product of zero-length iterable is empty
return
yield tuple(states)
current_index = max_index = len(iterables) - 1
while True:
try:
next_item = next(iterables[current_index])
except StopIteration:
if current_index > 0:
new_iter = iter(reiterables[current_index])
next_item = next(new_iter)
states[current_index] = next_item
iterables[current_index] = new_iter
current_index -= 1
else:
# last iterable has run out; terminate generator
return
else:
states[current_index] = next_item
current_index = max_index
yield tuple(states)
Tested:
>>> pi2 = PermutationsReiterable(2)
>>> list(pi2); list(pi2)
[(0, 1), (1, 0)]
[(0, 1), (1, 0)]
>>> list(product(pi2, repeat=2))
[((0, 1), (0, 1)), ((0, 1), (1, 0)), ((1, 0), (0, 1)), ((1, 0), (1, 0))]
>>> giant_product = product(PermutationsReiterable(100), repeat=5)
>>> len(list(itertools.islice(giant_product, 0, 5)))
5
>>> big_product = product(PermutationsReiterable(10), repeat=2)
>>> list(itertools.islice(big_product, 0, 5))
[((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 9, 8)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 7, 9)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 9, 7)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 9, 7, 8))]
Upvotes: 1
Reputation: 26501
Without "iterator recreation", it may be possible for the first of the factors. But that would save only 1/n space (where n is the number of factors) and add confusion.
So the answer is iterator recreation. A client of the function would have to ensure that the creation of the iterators is pure (no side-effects). Like
def iterProduct(ic):
if not ic:
yield []
return
for i in ic[0]():
for js in iterProduct(ic[1:]):
yield [i] + js
# Test
x3 = lambda: xrange(3)
for i in iterProduct([x3,x3,x3]):
print i
Upvotes: 2
Reputation: 157374
Here's an implementation which calls callables and iterates iterables, which are assumed restartable:
def product(*iterables, **kwargs):
if len(iterables) == 0:
yield ()
else:
iterables = iterables * kwargs.get('repeat', 1)
it = iterables[0]
for item in it() if callable(it) else iter(it):
for items in product(*iterables[1:]):
yield (item, ) + items
Testing:
import itertools
g = product(lambda: itertools.permutations(xrange(100)),
lambda: itertools.permutations(xrange(100)))
print next(g)
print sum(1 for _ in g)
Upvotes: 8