Measuring wasted time in python multiprocessing

Question

Is there a way to measure how much time each subprocess of a multiprocessing.Pool spends waiting for input vs. how much time they spend crunching data?

Let's take this simple example that processes a file's lines in parallel:

from multiprocessing import Pool
pool = Pool(processes=16)
with open('a_large_file.txt', 'r') as f:
    for foo in pool.imap_unordered(a_slow_function, f, chunksize=100)
        do_something_with(foo)

Depending on how long a_slow_function takes, how fast we can read from f, and the chunk size, my subprocesses could end up twiddling their thumbs while waiting for data. Can I measure this?

My best guess so far is to wrap cProfile.runctx around a_slow_function. This yields one profile file per worker, which I could then compare with the total run time of the script. However, do_something_with(foo) can skew the results, so I would have to take that into account. Is there a cleaner way to do this?

Note: I don't mind if the solution involves linux-specific tools.

Measuring wasted time in python multiprocessing

Answers (1)

Related Questions