Biel Cardona
Biel Cardona

Reputation: 103

Is there a way to efficiently compute the product of two (or more) iterators?

I am trying to compute "efficiently" the product of two iterators. Each of them takes a little to yield each result, and there are a lot of yielded results. Since it seems that itertools.product first computes all items, it takes quite a lot to get the first pair.

A MCVE is:

import time
from itertools import product

def costlygen(n):
    for i in range(n):
        time.sleep(1)
        yield i

g1 = costlygen(5)
g2 = costlygen(5)

now = time.time()
g = product(g1,g2)

for x in g:
    print(x)
    print(time.time()-now)

The output is:

(0, 0)
10.027392148971558
(0, 1)
10.027477979660034
(0, 2)
10.027528285980225
...
(4, 3)
10.028220176696777
(4, 4)
10.028250217437744

From the results it is clear that product computes all the items generated by each of the generators, and hence the first result is only yielded after 10 seconds, when it could have been yielded after only 2 seconds.

Is there any way to get the results as soon as they are produced?

Upvotes: 3

Views: 80

Answers (1)

sanyassh
sanyassh

Reputation: 8520

There is one possible solution which uses caching via gone list:

import time
from itertools import product

def costlygen(n):
    for i in range(n):
        time.sleep(1)
        yield i

def simple_product(it1, it2):
    gone = []
    x = next(it1)
    for y in it2:
        gone.append(y)
        yield x, y
    for x in it1:
        for y in gone:
            yield x, y

def complex_product(*iterables):
    if len(iterables) == 2:
        yield from simple_product(*iterables)
        return
    it1, *rest = iterables
    gone = []
    x = next(it1)
    for t in complex_product(*rest):
        gone.append(t)
        yield (x,) + t
    for x in it1:
        for t in gone:
            yield (x,) + t

g1 = costlygen(5)
g2 = costlygen(5)
g3 = costlygen(5)

now = time.time()
g = complex_product(g1,g2,g3)

for x in g:
    print(x)
    print(time.time()-now)

Timings:

(0, 0, 0)
3.002698898315429  # as soon as possible
(0, 0, 1)
4.003920316696167  # after one second
(0, 0, 2)
5.005135536193848
(0, 0, 3)
6.006361484527588
(0, 0, 4)
7.006711721420288
(0, 1, 0)
8.007975101470947
(0, 1, 1)
8.008066892623901  # third gen was already gone, so (*, *, 1) will be produced instantly after (*, *, 0)
(0, 1, 2)
8.008140802383423
(0, 1, 3)
8.00821304321289
(0, 1, 4)
8.008255004882812
(0, 2, 0)
9.009203910827637

Upvotes: 3

Related Questions