Reputation: 85
I have this code. In the sequential approach the message "no ok" is printed, while in the parallel approach the message ["ok", "ok", "ok"] is printed instead of the ["not ok", "not ok", "not ok"] that I expected.
How could I change variable globVar without giving it as argument in "test" function?
import multiprocessing
global globVar
globVar = 'ok'
def test(arg1):
print(arg1)
return globVar
if __name__ == "__main__" :
globVar = 'not ok'
#Sequential
print(test(0))
#Parallel
pool = multiprocessing.Pool()
argList = [0,1,2]
result = pool.map(test,argList)
pool.close()
Upvotes: 2
Views: 2593
Reputation: 705
On a windows machine, I tried a multiprocessing by which a global list variable is manipulated by some functions. In this example, the system is unable to manipulate the global variable. Actually, we are trying to add some new elements into a list by some functions and in a parallel manner; but the list is not filled finally:
from multiprocessing import Pool, freeze_support
import multiprocessing
global mylist
mylist = []
def f1(a):
global mylist
mylist.append(a)
return mylist
def f2(a):
global mylist
mylist.append(a)
return mylist
def f3(a):
global mylist
mylist.append(a)
return mylist
if __name__ == '__main__':
pool = Pool(multiprocessing.cpu_count())
freeze_support()
r1 = pool.apply_async(f1,[1])
r2 = pool.apply_async(f2,[2])
r3 = pool.apply_async(f3,[3])
r4 = pool.apply_async(f1,[4])
r5 = pool.apply_async(f2,[5])
r6 = pool.apply_async(f3,[6])
a1 = r1.get(timeout=234)
a2 = r2.get(timeout=234)
a3 = r3.get(timeout=234)
a4 = r4.get(timeout=234)
a5 = r5.get(timeout=234)
a6 = r6.get(timeout=234)
print(mylist)
The output is:
[]
I analyze when you create multiprocessing, then memory is divided into completely separated/independent pieces or segments. And there is no shared memory between the processes. That is a fact with multiprocessing feature of Python. The main process declares a global variable, then the other three subprocesses would define three other global variables for their own scopes. However, the programmer sees one global variable.
But, no worries, here is a solution for the above-mentioned example:
from multiprocessing import Pool, freeze_support
import multiprocessing
from multiprocessing import Process,Manager
def f1(a):
l = []
l.append(a)
return l
def f2(a):
l = []
l.append(a)
return l
def f3(a):
l = []
l.append(a)
return l
if __name__ == '__main__':
pool = Pool(multiprocessing.cpu_count())
freeze_support()
r1 = pool.apply_async(f1,[1])
r2 = pool.apply_async(f2,[2])
r3 = pool.apply_async(f3,[3])
r4 = pool.apply_async(f1,[4])
r5 = pool.apply_async(f2,[5])
r6 = pool.apply_async(f3,[6])
a1 = r1.get(timeout=234)
a2 = r2.get(timeout=234)
a3 = r3.get(timeout=234)
a4 = r4.get(timeout=234)
a5 = r5.get(timeout=234)
a6 = r6.get(timeout=234)
mylist = a1 + a2 + a3 + a4 + a5 + a6
print(mylist)
The output is:
[1, 2, 3, 4, 5, 6]
The solution is that you need to wrap up the output of every parallel processing functions.
Actually, that is the only strategy that I found working with Python and when I tried the other methods such as multiprocessing Manager, but it did not work properly.
Upvotes: 0
Reputation: 44108
TL;DR. You can skip to the last paragraph for the solution or read everything to understand what is actually going on.
You did not tag your question with your platform (e.g. windows
or linux
) as the guidelines for posting questions tagged with multiprocessing
requests that you do; the behavior ("behaviour" for Anglos) of global variables very much depends on the platform.
On platforms that use method spawn
to create new processes, such as Windows, to create and initialize each processes in the pool that is created with your pool = multiprocessing.Pool()
statement, a new, empty address space is created and a new Python interpreter is launched that re-reads and re-executes the source program in order to initialize the address space before ultimately calling the worker function test
. That means that every statement at global scope, i.e. import statements, variable declarations, function declarations, etc., are executed for this purpose. However, in the new subprocess variable __name__
will not be "__main__" so any statements within the if __name__ == "__main__" :
block will not be executed. That is why for Windows platforms you must put code that creates new processes within such a block. Failure to do so would result in an infinite recursive process-creation loop if it were to go otherwise undetected.
So if you are running under Windows, your main process has set globVar
to 'not ok' just prior to creating the pool. But when the processes are initialized prior to calling test
, your source is re-executed and each process, which runs in its own address space and therefore has its own copy of globVar
re-initialized that variable back to 'ok'. That is the value that test
will see and the previous statement implies that modifying that local copy of globVar
will not be reflected back to the main process.
Now on platforms that use fork
to create new processes, such as Linux
, things are a bit different. When the subprocesses are created, each one inherits the address space of the parent process as read-only and only when it attempts to modify memory does it get a copy ("copy on write"). This is clearly a more efficient process-creating mechanism. So in this case test
will see globVar
having a value of 'not ok' because that was the value it had at the time the subprocesses were created. But if test
updates globVar
, the "copy on write" mechanism will ensure that it is updating a globVar
that exists in a local address space. So again the main process will not see the updated value.
So having worker functions returning values as your test
function is doing is a standard way of reflecting back to the main process results. Your problem is that you are not starting with a globVar
value that you expected. This can be solved by initializing the pool's processes with the correct globVar
value using the initializer and initargs arguments to the Pool
constructor (see the documentation):
import multiprocessing
global globVar
globVar = 'ok'
def init_processes(gVar):
global globVar
globVar = gVar
def test(arg1):
print(arg1)
return globVar
if __name__ == "__main__" :
globVar = 'not ok'
#Sequential
print(test(0))
#Parallel
pool = multiprocessing.Pool(initializer=init_processes, initargs=(globVar,))
argList = [0,1,2]
result = pool.map(test,argList)
pool.close()
print(result)
Prints:
0
not ok
0
1
2
['not ok', 'not ok', 'not ok']
Upvotes: 2