Reputation: 452
I want to execute a function in parallel. So, I have used following codes:
from multiprocessing import Pool, get_context
def multi(itr):
return {itr: [{f'test{itr}'}]}
def test_parallel():
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
with Pool() as _pool:
res = _pool.map(multi, list1)
everything is ok and works correctly.
But, when the input(list1) is increased, sometime (not always) the program gets stuck. So, I googled and found a solution that should use get_context("spawn")
:
def test_parallel():
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
with get_context("spawn").Pool() as _pool:
res = _pool.map(multi, list1)
I call the above function in python console by following commands:
import test_parallel
test_parallel()
it throws a strange error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
prepare(preparation_data)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "/usr/lib/python3.6/runpy.py", line 261, in run_path
code, fname = _get_code_from_file(run_name, path_name)
File "/usr/lib/python3.6/runpy.py", line 231, in _get_code_from_file
with open(fname, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/PycharmProjects/djangoProject/<input>'
But, when I remove get_context("spawn")
and convert it to with Pool() as _pool:
everything is ok.
OS: Ubuntu 18
Python version: 3.6
Upvotes: 3
Views: 3598
Reputation: 3198
for linux based systems you might want to try
get_context("fork")
not spawn..
Upvotes: 0
Reputation: 11075
TLDR; put your target function in a separate module (.py file) and import it.
When using any other "start method" than "fork", there are some special requirements on the target function and the arguments of the child process. Specifically, they must be able to be pickled and sent via a pipe to the child process.
The pickle
library does not copy the actual code content of functions when it serializes them, rather it copies the import path, so the function can then be imported on the other end of the pickle (this is done for classes too... The instance data is serialized, but class definition itself is re-imported).
That leads us to needing to talk about importing functions... import
needs a file with which to execute, so if you're using an interactive console (including IPython notebooks), the main process has no file with which to import because the live session data doesn't exist anywhere but in-memory.
If you are running the script from a command prompt (or similar), functions defined in the main script can be accessed by importing the same file that was run as the main script. This is also why the main script needs to be written such that it doesn't do anything you don't want when it is imported rather than run as main via if __name__ == "__main__":
.
There is however a workaround to using multiprocessing in an interactive environment which is to bypass the problems of importing the main script entirely, and put your target function in a separate library which is importable. That way when the child process attempts to un-pickle the target function, it has a valid file to execute.
Upvotes: 5