Python multiprocessing not executing in parallel

Question

This is not my actual program but it illustrates my issue. This is the code:

import multiprocessing as mp
import subprocess
import random
O = open("test.txt","w")
for i in range(10000000):
    O.write("%s
" % (random.randint(0,9)))
O.close()

def worker(number):
    subprocess.call("awk \'$1==%s\' test.txt> test.%s.txt" % (number,number),shell=True)
    return number

pool = mp.Pool(processes=3)
results = [pool.apply(worker, args=(x,)) for x in range(10)]
print(results)

This code is working fine however I noticed that the awk commannds are executing sequentially instead of 3 at a time. Is there anything that I am missing?

David Maze · Accepted Answer

multiprocessing.Pool.apply...

blocks until the result is ready. Given this blocks, apply_async() is better suited for performing work in parallel.

If your core work really involves launching subprocesses rather than doing work natively in Python, you also might consider just outright launching a bunch of subprocess.Popen objects in a single Python process, then poll() and wait() on each of them. This saves a layer of process, but it can be much trickier to collect outputs of the subprocesses if they're writing things to their own stdout.

Python multiprocessing not executing in parallel

Answers (1)

Related Questions