Reputation: 103
I have written a small python program to see if I understand how global variables are transmitted to "child" processes.
import time
import random
shared_var = range(12)
def f(x):
global shared_var
time.sleep(1+random.random())
shared_var[x] = 100
print x, multiprocessing.current_process(), shared_var
return x*x
if __name__ == '__main__':
pool = multiprocessing.Pool(4)
results = pool.map(f, range(8))
print results
print shared_var
When I run it I get
3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11]
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11]
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11]
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11]
6 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 100, 7, 8, 9, 10, 11]
7 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 100, 8, 9, 10, 11]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
This is logical, since the child processes modify the global variable and, hence the copy-on-write mechanism makes that when a child process modifies a global variable, it is copied and hence any change is only visible in the spawned process.
My surprise was when I modified the code to print the identifiers of the variables:
import multiprocessing
import time
import random
shared_var = range(12)
def f(x):
global shared_var
time.sleep(1+random.random())
shared_var[x] = 100
print x, multiprocessing.current_process(), shared_var, id(shared_var)
return x*x
if __name__ == '__main__':
pool = multiprocessing.Pool(4)
results = pool.map(f, range(8))
print results
print shared_var, id(shared_var)
And got:
3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
6 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 100, 7, 8, 9, 10, 11] 4504973968
7 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 100, 8, 9, 10, 11] 4504973968
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11] 4504973968
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11] 4504973968
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
The identifiers of all the variables (in the main thread and in the spawned processes) are the same, while I expected a copy for each of the processes...
Does anyone know why I got these results? Also some references to how multiprocessing
deals with global variables being read/written by created Process
es would be great. Thanks!
Upvotes: 1
Views: 795
Reputation: 174
As you may know the id(x)
in CPython is actually accessing the memory address of an object.
Pleace check https://superuser.com/questions/347765/is-virtual-memory-related-to-virtual-address-space-of-a-process and Why Virtual Memory Address is the same in different process?. Basically n operating system arranges virtual memory address to each of the process, the process has no idea about the actual (physical) memory address of an object.
Upvotes: 0
Reputation: 7552
I think there's some confusion about the memory. You don't use multithreading, but multiprocessing, so each worker runs in a separate process, having its own virtual memory space. Therefore, each process has an own copy of shared_var
from the very beginning. This is what gets modified in each call to f(x)
, leaving the actual variable in __main__
unaffected.
You can check the docs for the chapter on sharing memory between processes e.g. using multiprocessing.Array
.
I'm not 100% sure why the address stays the same, but I think that since each new subprocess is spawned by forking the main process and copying its memory layout, the addresses in the virtual memory remain the same for each of the children. The physical memory address is of course different. That's why you see the same id
, but different values.
Upvotes: 1