Arcturus B
Arcturus B

Reputation: 5621

Why are Python operations 30× slower after calling time.sleep or subprocess.Popen?

Consider the following loop:

for i in range(20):
    if i == 10:
        subprocess.Popen(["echo"]) # command 1
    t_start = time.time()
    1+1 # command 2
    t_stop = time.time()
    print(t_stop - t_start)

“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of 1+1 as a function of the loop index i, averaged over 100 runs.

Execution of 1+1 is 30 times slower when preceded by subprocess.Popen.

Execution time of 1+1 as a function of loop index


It gets even weirder. One may think that only the first command run after subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.

var = 0
for i in range(20):
    if i == 10:
      # command 1
      subprocess.Popen(['echo'])
    # command 2a
    t_start = time.time()
    1 + 1
    t_stop = time.time()
    print(t_stop - t_start)
    # command 2b
    t_start = time.time()
    print(1)
    t_stop = time.time()
    print(t_stop - t_start)
    # command 2c
    t_start = time.time()
    var += 1
    t_stop = time.time()
    print(t_stop - t_start)

Here’s a plot of the execution times for this loop, average over 100 runs:

Execution times of 1+1, print(1), and var += 1 as a function of loop index


More remarks:


Question: What may cause this effect, and why does it affect only the current loop iteration?

I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?

Upvotes: 15

Views: 1056

Answers (1)

Sam Mason
Sam Mason

Reputation: 16174

my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system

the perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.

I get ~5 times the LIBPERF_COUNT_HW_CACHE_MISSES counter after the Popen() call:

from subprocess import Popen, DEVNULL
from perflib import PerfCounter
import numpy as np

arr = []
p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES')                                                        

for i in range(100):
  ti = []
  p.reset()
  p.start()
  ti.extend(p.getval() for _ in range(7))
  Popen(['echo'], stdout=DEVNULL)
  ti.extend(p.getval() for _ in range(7))
  p.stop()
  arr.append(ti)


np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()                                                

gives me:

 2605,  2185,  2127,  2099,  2407,  2120,
5481210,
16499, 10694, 10398, 10301, 10206, 10166

(lines broken in non-standard places to indicate code flow)

Upvotes: 8

Related Questions