Reputation: 57
This is not my actual program but it illustrates my issue. This is the code:
import multiprocessing as mp
import subprocess
import random
O = open("test.txt","w")
for i in range(10000000):
O.write("%s\n" % (random.randint(0,9)))
O.close()
def worker(number):
subprocess.call("awk \'$1==%s\' test.txt> test.%s.txt" % (number,number),shell=True)
return number
pool = mp.Pool(processes=3)
results = [pool.apply(worker, args=(x,)) for x in range(10)]
print(results)
This code is working fine however I noticed that the awk commannds are executing sequentially instead of 3 at a time. Is there anything that I am missing?
Upvotes: 0
Views: 1100
Reputation: 160073
blocks until the result is ready. Given this blocks,
apply_async()
is better suited for performing work in parallel.
If your core work really involves launching subprocesses rather than doing work natively in Python, you also might consider just outright launching a bunch of subprocess.Popen
objects in a single Python process, then poll()
and wait()
on each of them. This saves a layer of process, but it can be much trickier to collect outputs of the subprocesses if they're writing things to their own stdout.
Upvotes: 2