Trying to understand Multiprocessing with main in python

Using the code below I am getting strange output:

import  sys 
from  multiprocessing import Process
import time
from time import strftime

now =time.time()    
print time.strftime("%Y%m%d %H:%M:%S", time.localtime(now)) 

fr= [1,2,3]
for row in fr:
    print 3

print 1

def worker():
    print 'worker line'
    time.sleep(1)
    sys.exit(1)

def main():
    print 'start worker'
    Process(target=worker, args=()).start()
    print 'main line'

if __name__ == "__main__":
    start_time = time.time()
    main()
    end_time = time.time()
    duration = end_time - start_time
    print "Duration: %s" % duration

The output is:

20120324 20:35:53
3
3
3
1
start worker
main line
Duration: 0.0
20120324 20:35:53
3
3
3
1
worker line

I was thinking I would get this:

20120324 20:35:53
3
3
3
1
start worker
worker line
main line
Duration: 1.0

Why is this run twice? Using python 2.7 on WinX64 :

20120324 20:35:53
3
3
3
1
worker line

Upvotes: 3

Answers (2)

SingleNegationElimination

Reputation: 156138

the problem is basically because multiprocessing is really designed to run on a posix system, one with the fork(2) syscall. on those operating systems, the process can split into two, the child magically cloning the state from the parent, and both resume running in the same place, with the child now having a new process ID. In that situation, multiprocessing can arrange for some mechanism to ship state from parent to child as needed, with the certainty the child will already have most of the needed python state.

Windows does not have fork().

And so multiprocessing has to pick up the slack. This basically involves launching a brand new python interpreter running a multiprocessing child script. Almost immediately, the parent will ask the child to use something that is in the parent's state, and so the child will have to recreate that state from scratch, by importing your script into the child.

So anything that happens at import time in your script, will happen twice, once in the parent, and again in the child as it recreates the python environment needed to serve the parent.

Upvotes: 6

Brendan Wood

Reputation: 6440

This is what I get when I run your code on Linux using Python 2.7.3:

20120324 23:05:49
3
3
3
1
start worker
main line
Duration: 0.0045280456543
worker line

I don't know why yours runs twice, but I can tell you why it doesn't return the expected duration time or print in the "correct" order.

When you start a process using multiprocessing, the launch is asynchronous. That is, the .start() function returns immediately in the parent process, so that the parent process can continue to work and do other things (like launch more processes) while the child process does its own thing in the background. If you wanted to block the parent process from proceeding until the child process ends, you should use the .join() function. Like so:

def main():
    print 'start worker'
    p = Process(target=worker, args=())
    p.start()
    p.join()
    print 'main line'

Upvotes: 0

Trying to understand Multiprocessing with main in python

Answers (2)

Related Questions