Reputation: 1595
I have a simple function that I intend to run in Parallel using the Python multiprocessing module. However I get the following error RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
The error suggests that I add this:
if __name__ == '__main__':
freeze_support()
And most posts online suggest the same like this SO answer.
I added it and it works but I don't seem to understand why it's necessary for such a simple piece of code.
Code without __name__=="__main__" (throws RuntimeError)
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping...')
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} second(s)')
Code with __name__=="__main__" (doesn't throw RuntimeError)
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping...')
def main():
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} second(s)')
if __name__ == "__main__":
main()
Upvotes: 0
Views: 585
Reputation: 77347
In Windows, multiprocessing.Process
executes a fresh copy of python to run the code. It has to get the code you want to execute to load in that process so it pickles a snapshot of your current environment to expand in the child. For that to work, the child needs to reimport modules used by the parent. In particular, it needs to import the main script as a module. When you import, any code residing at module level executes.
So lets make the simplest case
foo.py
import multiprocessing as mp
process = mp.Process(target=print, args=('foo',))
process.start()
process.join()
process.start()
executes a new python which imports foo.py
. And there's the problem. That new foo
will create another subprocess which will again import foo.py. So yet another process is created.
The would go on until you blow up your machine except that python detects the problem and raises the exception.
THE FIX
Python modules have the __name__
attribute. If you run your program as a script, __name__
is "main", otherwise, __name__
is the name of your module. So, when a multiprocessing process is importing your main script to setup your environment, its name is not __main__
. You can use that to make sure that your MP work is only done in the parent module.
import multiprocessing as mp
if __name__ == "__main__":
# run as top level script, but not as imported module
process = mp.Process(target=print, args=('foo',))
process.start()
process.join()
Upvotes: 1