arun
arun

Reputation: 138

can't pickle type 'module' in multiprocessing.pool

I am parallelizing my simulation with Multiprocessing.pool, however i cannot pass a type 'module' in the pool.map which gives pickling error, i need to use that argument in the target function of pool.map to parallelize my code. In the function sample(), the fourth argument 'sim' is of type 'Module', so i cannot pass it with p.map(), since it cannot be iterated, but i need that argument in the function parallel(), which should be used as

       model=sim.simulate(modelname, packname, config)

But currently i am importing that module statically and calling in the function parallel() as

       model=OpenModelica.simulate(modelname, packname, config)

Currently my code looks like this, is there way to declare the argument 'sim' in function sample() as global and access it in the target function parallel().

       def sample(file,model,config,sim,resultDir,deleteDir):
           from multiprocessing import Pool
           p=Pool()
           p.map(parallel,zip(file,model,dirs,resultpath,config))

       def parallel(modellists):
         packname=[]
         packname.append(modellists[0])     
         modelname=modellists[1]
         dirname=modellists[2]
         path=modellists[3]
         config=modellists[4]
         os.chdir(dirname)   
         model=OpenModelica.Model(modelname, packname, config)

Upvotes: 2

Views: 1294

Answers (1)

Mike McKerns
Mike McKerns

Reputation: 35247

This is because pickle is unable to serialize a module, and thus multiprocessing can't pass the module through the map. However, if you use a fork of multiprocessing called pathos.multiprocessing, it works. This is because pathos uses the dill serializer, which can pickle modules.

>>> import dill 
>>> import numpy
>>> 
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool()
>>> 
>>> def getname(x):
...   return getattr(x, '__package__', None)
... 
>>> p.map(getname, [dill, numpy])
['dill', 'numpy']

It also works for multiple arguments, so it's a bit more natural than having to zip all the arguments -- and has asynchronous and iterative maps as well.

>>> packname = list('abcde')
>>> modelname = list('ABCDE')
>>> dirname = list('12345')
>>> config = [1,2,3,4,5]
>>> import math
>>> f = [math.sin, math.cos, math.sqrt, math.log, math.tan]
>>> 
>>> def parallel(s1, s2, si, i, f): 
...     s = (s1 + s2).lower().count('b')
...     return f(int(si) + i - s) 
... 
>>> res = p.amap(parallel, packname, modelname, dirname, config, f)
>>> print "asynchronous!"
'asynchronous!'
>>> res.get()
[0.9092974268256817, -0.4161468365471424, 2.449489742783178, 2.0794415416798357, 0.6483608274590867]

Get pathos here: https://github.com/uqfoundation

Upvotes: 2

Related Questions