Reputation: 1741
I am processing some ascii-data, make some operations, and then writing everything back to another file (job done by post_processing_0.main
, without returning anything). I want to parallelize the code with the multiprocessing module, see the following code snippet:
from multiprocessing import Pool
import post_processing_0
def chunks(lst,n):
return [ lst[i::n] for i in xrange(n) ]
def main():
pool = Pool(processes=proc_num)
P={}
for i in range(0,proc_num):
P['process_'+str(i)]=pool.apply_async(post_processing_0.main, [split_list[i]])
pool.close()
pool.join()
proc_num=8
timesteps=100
list_to_do=range(0,timesteps)
split_list=chunks(list_to_do,proc_num)
main()
I read the difference between map and async, but I don t understand it very well. Is my application of multiprocessing module correct?
In this case, should I use map_async or apply_async? And why?
Edit:
I don't think this is a duplicate of the question Python multiprocessing.Pool: when to use apply, apply_async or map?. In the question, the answer focus on the order of the result that can be obtained using the two functions. Here i am asking: what is it the difference when nothing is returned?
Upvotes: 15
Views: 18437
Reputation: 94871
I would recommend map_async
for three reasons:
It's cleaner looking code. This:
pool = Pool(processes=proc_num)
async_result = pool.map_async(post_processing_0.main, split_list)
pool.close()
pool.join()
looks nicer than this:
pool = Pool(processes=proc_num)
P={}
for i in range(0,proc_num):
P['process_'+str(i)]=pool.apply_async(post_processing_0.main, [split_list[i]])
pool.close()
pool.join()
With apply_async
, if an exception occurs inside of post_processing_0.main
, you won't know about it unless you explicitly call P['process_x'].get()
on the failing AsyncResult
object, which would require iterating over all of P
. With map_async
the exception will be raised if you call async_result.get()
- no iteration required.
map_async
has built-in chunking functionality, which will make your code perform noticeably better if split_list
is very large.
Other than that, the behavior is basically the same if you don't care about the results.
Upvotes: 16
Reputation: 249123
apply_async
submits a single job to the pool. map_async
submits multiple jobs calling the same function with different arguments. The former takes a function plus argument list; the latter takes a function plus iterable (i.e. sequence) which represents the arguments. map_async
can only call unary functions (i.e. functions taking one argument).
In your case, it might be better to restructure the code slightly to put all your arguments in a single list and just call map_async
once with that list.
Upvotes: 15