Reputation: 7374
Just getting started with using the multiprocessing
library in my code base to parallelise a simple for loop, where previously, in a serial for loop, I would import a custom configuration .py
file and pass it to be a function to be run.
However I'm having issues with passing in the configuration module to be parellelised.
NB. There are multiple custom configuration.py
which I want to pass into the different processes.
Example:
def get_custom_config():
config_list = []
for project_config in configs:
config = importlib.import_module("config.%s.%s" % (prefix, project_config)
config_list.append(config)
return config_list
def print_config(config):
print config.something_in_config_file
if __name__ = "__main__":
config_list = get_custom_config()
pool = mp.Pool(processes=2)
pool.map(print_config, config_list)
Returns:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed
What is the best way of passing a module to a parallel process?
Upvotes: 1
Views: 1615
Reputation: 146540
I do have a possible solution for you, but I don't like the approach you have.
config = importlib.import_module("config.%s.%s" % (prefix, project_config)
You should try and have config as a dictionary of key value pairs instead as a module. Or import it that way.
The issue is that functions
and modules
are not picklable by default in Python 2.7. Functions are picklable by default in Python 3.X and modules are still not.
import importlib
import multiprocessing as mp
configs = ["abc", "def"]
import copy_reg
import types
def _pickle_module(module):
module_name = module.__name__
print("pickling" + module_name)
path = getattr(module, "__file__", None)
return _unpickle_module, (module_name, path)
def _unpickle_module(module_name, path):
return importlib.import_module(module_name)
copy_reg.pickle(types.ModuleType, _pickle_module, _unpickle_module)
def get_custom_config():
config_list = []
for project_config in configs:
config = importlib.import_module("config.%s" % (project_config))
config_list.append(config)
return config_list
def print_config(config):
print (vars(config))
if __name__ == "__main__":
config_list = get_custom_config()
pool = mp.Pool(processes=2)
pool.map(print_config, config_list)
This basically re-imports the module in the other process, so do remember you are not sharing data between them. This is a good read only variables.
But as I mentioned passing modules to a different process makes less sense. Try to fix your approach instead of using the code I posted
PS: Solution inspired from Can't pickle <type 'cv2.BRISK'>: attribute lookup cv2.BRISK failed
Upvotes: 2