Reputation: 538
Is there a way to measure how much time each subprocess of a multiprocessing.Pool
spends waiting for input vs. how much time they spend crunching data?
Let's take this simple example that processes a file's lines in parallel:
from multiprocessing import Pool
pool = Pool(processes=16)
with open('a_large_file.txt', 'r') as f:
for foo in pool.imap_unordered(a_slow_function, f, chunksize=100)
do_something_with(foo)
Depending on how long a_slow_function
takes, how fast we can read from f
, and the chunk size, my subprocesses could end up twiddling their thumbs while waiting for data. Can I measure this?
My best guess so far is to wrap cProfile.runctx
around a_slow_function
. This yields one profile file per worker, which I could then compare with the total run time of the script. However, do_something_with(foo)
can skew the results, so I would have to take that into account. Is there a cleaner way to do this?
Note: I don't mind if the solution involves linux-specific tools.
Upvotes: 2
Views: 476
Reputation: 60147
You could try line_profiler
, a line profiler, to get the time for the
for foo in pool.imap_unordered(a_slow_function, f, chunksize=100)
line and the sum time inside a_slow_function
and then compare those two numbers.
I'm not sure if it's a good idea, but it's an idea nonetheless.
You could also just try timing things seperately, such as seeing how quickly you can read the lines in the file:
eg.
for line in f: pass
Upvotes: 1