Rekovni
Rekovni

Reputation: 7374

Can't import custom python module using multiprocess library

Just getting started with using the multiprocessing library in my code base to parallelise a simple for loop, where previously, in a serial for loop, I would import a custom configuration .py file and pass it to be a function to be run.

However I'm having issues with passing in the configuration module to be parellelised.

NB. There are multiple custom configuration.py which I want to pass into the different processes.

Example:

def get_custom_config(): 
   config_list = []
   for project_config in configs:
       config = importlib.import_module("config.%s.%s" % (prefix, project_config)
       config_list.append(config)
   return config_list

def print_config(config):
   print config.something_in_config_file

if __name__ = "__main__":
   config_list = get_custom_config()

   pool = mp.Pool(processes=2)
   pool.map(print_config, config_list)

Returns:

  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed

What is the best way of passing a module to a parallel process?

Upvotes: 1

Views: 1615

Answers (1)

Tarun Lalwani
Tarun Lalwani

Reputation: 146540

I do have a possible solution for you, but I don't like the approach you have.

config = importlib.import_module("config.%s.%s" % (prefix, project_config)

You should try and have config as a dictionary of key value pairs instead as a module. Or import it that way.

The issue is that functions and modules are not picklable by default in Python 2.7. Functions are picklable by default in Python 3.X and modules are still not.

import importlib
import multiprocessing as mp

configs = ["abc", "def"]
import copy_reg
import types


def _pickle_module(module):
    module_name = module.__name__
    print("pickling" + module_name)
    path = getattr(module, "__file__", None)
    return _unpickle_module, (module_name, path)


def _unpickle_module(module_name, path):
    return importlib.import_module(module_name)

copy_reg.pickle(types.ModuleType, _pickle_module, _unpickle_module)


def get_custom_config():
    config_list = []
    for project_config in configs:
        config = importlib.import_module("config.%s" % (project_config))
        config_list.append(config)
    return config_list


def print_config(config):
    print (vars(config))


if __name__ == "__main__":
    config_list = get_custom_config()

    pool = mp.Pool(processes=2)
    pool.map(print_config, config_list)

This basically re-imports the module in the other process, so do remember you are not sharing data between them. This is a good read only variables.

But as I mentioned passing modules to a different process makes less sense. Try to fix your approach instead of using the code I posted

PS: Solution inspired from Can't pickle <type 'cv2.BRISK'>: attribute lookup cv2.BRISK failed

Upvotes: 2

Related Questions