better understanding python multiprocessing

Question

I wrote the following code

from multiprocessing import Pool 

class A:
    def __init__(self) -> None:
        self.a = 1
    def incr(self,b):
        self.a += 1
        print(self.a)
    def execute(self):
        with Pool(2) as p:
            p.starmap(self.incr, [[1],[1],[1],[1]])

a = A()
a.execute()
print(a.a)

The output is 2 2 2 2 1. I want to understand what exactly happens in this scenario. Does the pool create four copies of self? If so how is this copying done?

azelcer · Accepted Answer

In this scenario, the file is run linearly until a.execute() is called and reaches the call to starmap. What multiprocessing does is to create two more processes. The way this is done depends is limited by the OS and can (in some cases be selected using multiprocessing.set_start_method. The main differences between the methods is that with 'fork' methods the new process are either created copying the current process (it is more complicated, but that's the idea), while in the 'spawn' methods a new python interpreter is started, the file is re-executed, and the call is performed. In this later case, as the file is re-executed, it is very important to use the if __name__ == "__main__": guard.

As each process has its own copy of a, what happens in one process does not affect the other copies of a: even if all copies of have a self.a value of 2, the original a is unchanged. You can test the different methods for starting process with this piece of code (delays added for clarity):

from multiprocessing import Pool, set_start_method, get_start_method
import os
from time import sleep

globalflag = "Flag"

class A:
    def __init__(self) -> None:
        self.a = 1
    def incr(self, b):

        sleep(b/10)
        print("Process id:", os.getpid(), " incrementing")
        print("Flag = ", globalflag)
        self.a += 1
        print(self.a)
    def execute(self):
        with Pool() as p:
            p.starmap(self.incr, [[1],[2],[3],[4]])

print("Running module:", __name__)

if __name__ == "__main__":
    method = 'fork'#'spawn' # 'fork' 'forkserver'
    if get_start_method(allow_none=True) is None:
        print("Setting start method to ", method)
        set_start_method(method)
    else:
        print('Start method already set: ', get_start_method())
    a = A()
    globalflag = "Flog"
    a.execute()
    print(a.a)

output with method='fork':

Running module: __main__
Start method already set:  fork
Process id: 6687  incrementing
Flag =  Flog
2
Process id: 6688  incrementing
Flag =  Flog
2
Process id: 6689  incrementing
Flag =  Flog
2
Process id: 6690  incrementing
Flag =  Flog
2
1

output with 'spawn':

Running module: __main__
Setting start method to  spawn
Running module: __mp_main__
Running module: __mp_main__
Running module: __mp_main__
Running module: __mp_main__
Running module: __mp_main__
Process id: 7096  incrementing
Flag =  Flag
2
Running module: __mp_main__
Process id: 7097  incrementing
Flag =  Flag
2
Running module: __mp_main__
Process id: 7098  incrementing
Flag =  Flag
2
Running module: __mp_main__
Process id: 7100  incrementing
Flag =  Flag
2
1

better understanding python multiprocessing

Answers (2)

Related Questions