Reputation: 8127
I'm benchmarking this script on a 6-core CPU with Ubuntu 22.04.1 and Python 3.10.6. It is supposed to show usage of all available CPU cores with par
function vs. a single core with ser
function.
import numpy as np
from multiprocessing import Pool
import timeit as ti
def foo(n):
return -np.sort(-np.arange(n))[-1]
def par(reps, bigNum, pool):
for i in range(bigNum, bigNum+reps):
pool.apply_async(foo, args=(i,))
def ser(reps, bigNum):
for i in range(bigNum, bigNum+reps):
foo(i)
if __name__ == '__main__':
bigNum = 9_000_000
reps = 6
fun = f'par(reps, bigNum, pool)'
t = 1000 * np.array(ti.repeat(stmt=fun, setup='pool=Pool(reps);'+fun, globals=globals(), number=1, repeat=10))
print(f'{fun}: {np.amin(t):6.3f}ms {np.median(t):6.3f}ms')
fun = f'ser(reps, bigNum)'
t = 1000 * np.array(ti.repeat(stmt=fun, setup=fun, globals=globals(), number=1, repeat=10))
print(f'{fun}: {np.amin(t):6.3f}ms {np.median(t):6.3f}ms')
Right now, par
function only shows the time to spin the worker processes. What do I need to change in function par
, in order to make it wait for all worker processes to complete before returning? Note that I would like to reuse the process pool between calls.
Upvotes: 1
Views: 85
Reputation: 17516
you need to get the result from apply_async to wait for it.
def par(reps, bigNum, pool):
jobs = []
for i in range(bigNum, bigNum+reps):
jobs.append(pool.apply_async(foo, args=(i,)))
for job in jobs:
job.get()
for long loops you should be using map
or imap
or imap_unordered
instead of apply_async
as it has less overhead and you get to control the chunksize for faster serialization of small objects, and you can pass generators to them to save memory or allow infinite generators (with imap).
def par(reps, bigNum, pool):
pool.map(foo, range(bigNum,bigNum+reps), chunksize=1)
note: python PEP8 indentation is 4 spaces, not 2.
Upvotes: 1