Dave Kalu
Dave Kalu

Reputation: 1595

Python multiprocessing RuntimeError

I have a simple function that I intend to run in Parallel using the Python multiprocessing module. However I get the following error RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. The error suggests that I add this:

if __name__ == '__main__':
freeze_support()

And most posts online suggest the same like this SO answer.

I added it and it works but I don't seem to understand why it's necessary for such a simple piece of code.

Code without __name__=="__main__" (throws RuntimeError)

import multiprocessing
import time

start = time.perf_counter()


def do_something():
    print('Sleeping 1 second...')
    time.sleep(1)
    print('Done sleeping...')

p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()

finish = time.perf_counter()

print(f'Finished in {round(finish - start, 2)} second(s)')

Code with __name__=="__main__" (doesn't throw RuntimeError)

import multiprocessing
import time

start = time.perf_counter()


def do_something():
  print('Sleeping 1 second...')
  time.sleep(1)
  print('Done sleeping...')


def main():
   p1 = multiprocessing.Process(target=do_something)
   p2 = multiprocessing.Process(target=do_something)
   p1.start()
   p2.start()

   finish = time.perf_counter()
   print(f'Finished in {round(finish - start, 2)} second(s)')


if __name__ == "__main__":
   main()

Upvotes: 0

Views: 585

Answers (1)

tdelaney
tdelaney

Reputation: 77347

In Windows, multiprocessing.Process executes a fresh copy of python to run the code. It has to get the code you want to execute to load in that process so it pickles a snapshot of your current environment to expand in the child. For that to work, the child needs to reimport modules used by the parent. In particular, it needs to import the main script as a module. When you import, any code residing at module level executes.

So lets make the simplest case

foo.py

import multiprocessing as mp
process = mp.Process(target=print, args=('foo',))
process.start()
process.join()

process.start() executes a new python which imports foo.py. And there's the problem. That new foo will create another subprocess which will again import foo.py. So yet another process is created.

The would go on until you blow up your machine except that python detects the problem and raises the exception.

THE FIX

Python modules have the __name__ attribute. If you run your program as a script, __name__ is "main", otherwise, __name__ is the name of your module. So, when a multiprocessing process is importing your main script to setup your environment, its name is not __main__. You can use that to make sure that your MP work is only done in the parent module.

import multiprocessing as mp

if __name__ == "__main__":
    # run as top level script, but not as imported module
    process = mp.Process(target=print, args=('foo',))
    process.start()
    process.join()

Upvotes: 1

Related Questions