Rahman
Rahman

Reputation: 452

get_context("spawn") not work in multiprocessing

I want to execute a function in parallel. So, I have used following codes:

from multiprocessing import Pool, get_context

    def multi(itr):
    return {itr: [{f'test{itr}'}]}    


def test_parallel():
    list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    with Pool() as _pool:
         res = _pool.map(multi, list1)

everything is ok and works correctly. But, when the input(list1) is increased, sometime (not always) the program gets stuck. So, I googled and found a solution that should use get_context("spawn"):

def test_parallel():
        list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
        with get_context("spawn").Pool() as _pool:
             res = _pool.map(multi, list1)

I call the above function in python console by following commands:

import test_parallel
test_parallel()

it throws a strange error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/usr/lib/python3.6/runpy.py", line 261, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "/usr/lib/python3.6/runpy.py", line 231, in _get_code_from_file
    with open(fname, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/PycharmProjects/djangoProject/<input>'

But, when I remove get_context("spawn") and convert it to with Pool() as _pool: everything is ok.

OS: Ubuntu 18

Python version: 3.6

Upvotes: 3

Views: 3598

Answers (2)

Erik
Erik

Reputation: 3198

for linux based systems you might want to try

get_context("fork")

not spawn..

Upvotes: 0

Aaron
Aaron

Reputation: 11075

TLDR; put your target function in a separate module (.py file) and import it.

When using any other "start method" than "fork", there are some special requirements on the target function and the arguments of the child process. Specifically, they must be able to be pickled and sent via a pipe to the child process.

The pickle library does not copy the actual code content of functions when it serializes them, rather it copies the import path, so the function can then be imported on the other end of the pickle (this is done for classes too... The instance data is serialized, but class definition itself is re-imported).

That leads us to needing to talk about importing functions... import needs a file with which to execute, so if you're using an interactive console (including IPython notebooks), the main process has no file with which to import because the live session data doesn't exist anywhere but in-memory.

If you are running the script from a command prompt (or similar), functions defined in the main script can be accessed by importing the same file that was run as the main script. This is also why the main script needs to be written such that it doesn't do anything you don't want when it is imported rather than run as main via if __name__ == "__main__":.

There is however a workaround to using multiprocessing in an interactive environment which is to bypass the problems of importing the main script entirely, and put your target function in a separate library which is importable. That way when the child process attempts to un-pickle the target function, it has a valid file to execute.

Upvotes: 5

Related Questions