Reputation: 621
Note: I don't need any communication between the processes/threads, I'm interested in completion signal only (that's the reason I posted this question as a new one since all other examples I've found communicated between each other).
How can I use multiprocessing
package in Python 3 to parallelize the following piece of code (the end goal is to make it run faster):
a = 123
b = 456
for id in ids: # len(ids) = 10'000
# executes a binary with CLI flags
run_binary_with_id(id, a, b)
# i.e. runs "./hello_world_exec --id id --a a --b b" which takes about 30 seconds on average
I tried the following:
import multiprocessing as mp
def run_binary_with_id(id, a, b):
run_command('./hello_world_exec --id {} --a {} --b {}'.format(id, a, b))
if __name__ == '__main__':
ctx = mp.get_context('spawn')
q = ctx.Queue()
a = 123
b = 456
ids = range(10000)
for id in ids:
p = ctx.Process(target=run_binary_with_id, args=(id,a,b))
p.start()
p.join()
# The binary was executed len(ids) number of times, do other stuff assuming everything's completed at this point
or
for id in ids:
map.apply_async(run_binary_with_id, (id,a,b))
In a similar question the answer is the following:
def consume(iterator):
deque(iterator, max_len=0)
x=pool.imap_unordered(f,((i,j) for i in range(10000) for j in range(10000)))
consume(x)
which I don't really understand at all (why do I need this consume()
).
Upvotes: 1
Views: 63
Reputation: 107124
Trying to spawn 10000 processes to run in parallel is almost certainly going to overload your system and make it run slower than running the processes sequentially due to the overhead involved in the OS having to constantly perform context switching between processes when the number of processes far exceeds the number of CPUs/cores your system has.
You can instead use multiprocessing.Pool
to limit the number of worker processes spawned for the task. The Pool
constructor limits the number of processes to the number of cores your system has by default, but you can fine tune it if you'd like with the processes
parameter. You can then use its map
method to easily map a sequence of arguments to apply to a given function to run in parallel. It can only map one argument to the function, however, so you would have to use functools.partial
to supply default values for the other arguments, which in your case do not change between calls:
from functools import partial
if __name__ == '__main__':
_run_binary_with_id = partial(run_binary_with_id, a=123, b=456)
with mp.Pool() as pool:
pool.map(_run_binary_with_id, range(10000))
Upvotes: 1