Reputation: 121
I have the following code
import multiprocessing as mp
import os
def funct(name):
if nameisvalid:
do_some_stuff_and_save_a_file
return 1
else:
return 0
num_proc = 20 #or a call to slurm/mp for number of processors
pool = mp.Pool(processes=num_proc)
results = pool.map_async(makeminofname, [n for n in nameindex])
pool.close()
pool.join()
I have run this on my desktop with a 6-core processor with num_proc=mp.cpu_count()
and it works fine and fast, but when I try to run this script in an sbatch script on our processing cluster, with -N 1 -n 20 (our nodes each have 24 processors), or any number of processors, it runs incredibly slow and only appears to utilize between 10-15 processors. Is there some way to optimize multiprocessing for working with slurm?
Upvotes: 1
Views: 1320
Reputation: 121
funct
checked the disk for a specific file, then loaded a file, then did work, then saved a file. This caused my individual processes to be waiting for input/output operations instead of working. So I loaded all of the initial data before passing it to the pool, and added a Process
from multiprocessing
dedicated to saving files from a Queue
that the pooled processes put their output into, so there is only ever one process trying to save.
Upvotes: 1