Biel Cardona
Biel Cardona

Reputation: 103

multiprocessing module in python and modifying shared global variables

I have written a small python program to see if I understand how global variables are transmitted to "child" processes.

import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var

When I run it I get

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11]
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11]
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11]
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11]
6 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 100, 7, 8, 9, 10, 11]
7 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 100, 8, 9, 10, 11]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

This is logical, since the child processes modify the global variable and, hence the copy-on-write mechanism makes that when a child process modifies a global variable, it is copied and hence any change is only visible in the spawned process.

My surprise was when I modified the code to print the identifiers of the variables:

import multiprocessing
import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var, id(shared_var)
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var, id(shared_var)

And got:

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
6 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 100, 7, 8, 9, 10, 11] 4504973968
7 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 100, 8, 9, 10, 11] 4504973968
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11] 4504973968
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11] 4504973968
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968

The identifiers of all the variables (in the main thread and in the spawned processes) are the same, while I expected a copy for each of the processes...

Does anyone know why I got these results? Also some references to how multiprocessing deals with global variables being read/written by created Processes would be great. Thanks!

Upvotes: 1

Views: 795

Answers (2)

idailylife
idailylife

Reputation: 174

As you may know the id(x) in CPython is actually accessing the memory address of an object.

Pleace check https://superuser.com/questions/347765/is-virtual-memory-related-to-virtual-address-space-of-a-process and Why Virtual Memory Address is the same in different process?. Basically n operating system arranges virtual memory address to each of the process, the process has no idea about the actual (physical) memory address of an object.

Upvotes: 0

Pavel
Pavel

Reputation: 7552

I think there's some confusion about the memory. You don't use multithreading, but multiprocessing, so each worker runs in a separate process, having its own virtual memory space. Therefore, each process has an own copy of shared_var from the very beginning. This is what gets modified in each call to f(x), leaving the actual variable in __main__ unaffected.

You can check the docs for the chapter on sharing memory between processes e.g. using multiprocessing.Array.

I'm not 100% sure why the address stays the same, but I think that since each new subprocess is spawned by forking the main process and copying its memory layout, the addresses in the virtual memory remain the same for each of the children. The physical memory address is of course different. That's why you see the same id, but different values.

Upvotes: 1

Related Questions