rednafi
rednafi

Reputation: 1731

Performance discrepancies between itertools.product & list comprehension

I was exploring Python's itertools module and came across the itertools.product function that returns the same as ((x,y) for x in A for y in B). I find it to be a really neat way of reducing nestings in complex for-loops where list comprehension might be inadequate. However, before proceeding, I wanted to check how it performs with the alternative methods. Here're a few tests that I conducted. Used Jupyter Notebook's built-in %%timeit to measure the performance.

case-1: vanilla list comprehension

%%timeit -n 50 -r 5

[(x,y) for x in range(1000) if x%2==0 for y in range(1000) if y%2==1]
>>> 35.8 ms ± 1.3 ms per loop (mean ± std. dev. of 5 runs, 50 loops each)

case-2: itertools.product in a list comprehension

Removed itertools import to avoid import time showing up here.

%%timeit -n 50 -r 5

[(x,y) for (x,y) in itertools.product(range(1000), range(1000)) if x%2==0 and y%2==1]
>>> 62.1 ms ± 1.16 ms per loop (mean ± std. dev. of 5 runs, 50 loops each)

case-3: vanilla nested for loop

%%timeit -n 50 -r 5

lst = []
for x in range(1000):
    for y in range(1000):
       if x%2 == 0 and y%2 == 1:
           lst.append((x,y))
>>> 72 ms ± 769 µs per loop (mean ± std. dev. of 5 runs, 50 loops each)

case-4: for loop with itertools.product

%%timeit -n 50 -r 5

lst = []
for x, y in itertools.product(range(1000),range(1000)):
    if x%2==0 and y%2==1:
        lst.append((x,y))
>>> 74.5 ms ± 2.13 ms per loop (mean ± std. dev. of 5 runs, 50 loops each)

However, this part of the documentation, I suppose, claims better performance than vanilla for loops. Also, shouldn't case-2 be faster than case-1? In case-3 and case-4, the performance difference gets worse for itertools.product as the sizes of the iterables increase. What is going on here? Also, please add a few examples where itertools.product might be a better candidate than listcomp or nested for loops.

Upvotes: 0

Views: 536

Answers (1)

ForceBru
ForceBru

Reputation: 44878

You're comparing different things:

[(x,y) for x in range(1000) if x%2==0 for y in range(1000) if y%2==1]

...is not the same as

[(x,y) for x in range(1000) for y in range(1000) if x%2==0 and y%2==1]

The first one skips the second loop entirely if x%2 != 0, the second one loops through all 1000 ** 2 == 1,000,000 combinations. Cases 2 to 4 are in the same category as the second comprehension here, so they are inherently slower.

Upvotes: 3

Related Questions