pflaquerre
pflaquerre

Reputation: 538

Measuring wasted time in python multiprocessing

Is there a way to measure how much time each subprocess of a multiprocessing.Pool spends waiting for input vs. how much time they spend crunching data?

Let's take this simple example that processes a file's lines in parallel:

from multiprocessing import Pool
pool = Pool(processes=16)
with open('a_large_file.txt', 'r') as f:
    for foo in pool.imap_unordered(a_slow_function, f, chunksize=100)
        do_something_with(foo)

Depending on how long a_slow_function takes, how fast we can read from f, and the chunk size, my subprocesses could end up twiddling their thumbs while waiting for data. Can I measure this?

My best guess so far is to wrap cProfile.runctx around a_slow_function. This yields one profile file per worker, which I could then compare with the total run time of the script. However, do_something_with(foo) can skew the results, so I would have to take that into account. Is there a cleaner way to do this?

Note: I don't mind if the solution involves linux-specific tools.

Upvotes: 2

Views: 476

Answers (1)

Veedrac
Veedrac

Reputation: 60147

You could try line_profiler, a line profiler, to get the time for the

for foo in pool.imap_unordered(a_slow_function, f, chunksize=100)

line and the sum time inside a_slow_function and then compare those two numbers.

I'm not sure if it's a good idea, but it's an idea nonetheless.


You could also just try timing things seperately, such as seeing how quickly you can read the lines in the file:

eg.

for line in f: pass

Upvotes: 1

Related Questions