user3715117
user3715117

Reputation: 57

Python multiprocessing not executing in parallel

This is not my actual program but it illustrates my issue. This is the code:

import multiprocessing as mp
import subprocess
import random
O = open("test.txt","w")
for i in range(10000000):
    O.write("%s\n" % (random.randint(0,9)))
O.close()

def worker(number):
    subprocess.call("awk \'$1==%s\' test.txt> test.%s.txt" % (number,number),shell=True)
    return number

pool = mp.Pool(processes=3)
results = [pool.apply(worker, args=(x,)) for x in range(10)]
print(results)

This code is working fine however I noticed that the awk commannds are executing sequentially instead of 3 at a time. Is there anything that I am missing?

Upvotes: 0

Views: 1100

Answers (1)

David Maze
David Maze

Reputation: 160073

multiprocessing.Pool.apply...

blocks until the result is ready. Given this blocks, apply_async() is better suited for performing work in parallel.

If your core work really involves launching subprocesses rather than doing work natively in Python, you also might consider just outright launching a bunch of subprocess.Popen objects in a single Python process, then poll() and wait() on each of them. This saves a layer of process, but it can be much trickier to collect outputs of the subprocesses if they're writing things to their own stdout.

Upvotes: 2

Related Questions