jacob
jacob

Reputation: 518

Why is it faster to assign the iterator returned by zip() to a variable?

In the following code, time2-time1 is always negative:

import time

a = [num for num in range(int(1e6))]
b = [num for num in range(int(1e6))]

start_time = time.time()
e = [(c, d) for c, d in zip(a, b)]
time1 = time.time() - start_time
print("--- %s seconds ---" % (time1))

start_time = time.time()
_zip = zip(a, b)
e = [(c, d) for c, d in _zip]
time2 = time.time() - start_time
print("--- %s seconds ---" % (time2))

print(time2-time1)

I assume that the reason for this is because in the first case, we need to call zip() many more times than in the second. If this is the case, why doesn't zip just return the first element in the iterable every time it's called? Doesn't zip() create new iterators over a and b each time you call it? Does zip() hash every iterator it creates and store the iterator for future calls with the same hash?

Is it good or bad practice to assign a variable to a zip() call before iterating over it? Is the performance gain generally worth the extra line of code?

Upvotes: 1

Views: 65

Answers (1)

Karl Knechtel
Karl Knechtel

Reputation: 61519

I tried wrapping the two versions of the code into functions, benchmarking properly using timeit, and inspecting the resulting code using dis:

>>> import timeit
>>> def with_assignment():
...   _zip = zip(a, b)
...   return [(c, d) for c, d in _zip]
...
>>> def without_assignment():
...   return [(c, d) for c, d in zip(a, b)]
...
>>> a, b = list(range(1000000)), list(range(1000000))
>>> timeit.timeit(with_assignment, number=100)
16.1892559
>>> timeit.timeit(without_assignment, number=100) # indeed, it's a little slower,
16.3349139
>>> timeit.timeit(with_assignment, number=100)
16.261616600000004
>>> timeit.timeit(without_assignment, number=100) # and consistently so
16.42448019999999
>>> import dis # So let's look under the hood:
>>> dis.dis(with_assignment)
  2           0 LOAD_GLOBAL              0 (zip)
              2 LOAD_GLOBAL              1 (a)
              4 LOAD_GLOBAL              2 (b)
              6 CALL_FUNCTION            2
              8 STORE_FAST               0 (_zip)

  3          10 LOAD_CONST               1 (<code object <listcomp> at 0x00000226ABCA08A0, file "<stdin>", line 3>)
             12 LOAD_CONST               2 ('with_assignment.<locals>.<listcomp>')
             14 MAKE_FUNCTION            0
             16 LOAD_FAST                0 (_zip)
             18 GET_ITER
             20 CALL_FUNCTION            1
             22 RETURN_VALUE
>>> dis.dis(without_assignment)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x00000226AD9299C0, file "<stdin>", line 2>)
              2 LOAD_CONST               2 ('without_assignment.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (zip)
              8 LOAD_GLOBAL              1 (a)
             10 LOAD_GLOBAL              2 (b)
             12 CALL_FUNCTION            2
             14 GET_ITER
             16 CALL_FUNCTION            1
             18 RETURN_VALUE
>>>

I'm afraid I don't see an obvious cause here, however. Aside from the extra STORE_FAST and LOAD_FAST (setting up and then using the local _zip), it really does seem that all the same work is being done, albeit in a different order.

Upvotes: 1

Related Questions