glo
glo

Reputation: 85

How global variable works in parallel programming with Python?

I have this code. In the sequential approach the message "no ok" is printed, while in the parallel approach the message ["ok", "ok", "ok"] is printed instead of the ["not ok", "not ok", "not ok"] that I expected.

How could I change variable globVar without giving it as argument in "test" function?

import multiprocessing

global globVar
globVar = 'ok'

def test(arg1):
    print(arg1)
    return globVar
    
if __name__ == "__main__" :
    globVar = 'not ok'

    #Sequential
    print(test(0))    
    
    #Parallel 
    pool = multiprocessing.Pool()
    argList = [0,1,2]
    result = pool.map(test,argList)
    pool.close()

Upvotes: 2

Views: 2593

Answers (2)

Farzad Amirjavid
Farzad Amirjavid

Reputation: 705

On a windows machine, I tried a multiprocessing by which a global list variable is manipulated by some functions. In this example, the system is unable to manipulate the global variable. Actually, we are trying to add some new elements into a list by some functions and in a parallel manner; but the list is not filled finally:

from multiprocessing import Pool, freeze_support
import multiprocessing
global mylist
mylist = []
def f1(a):
    global mylist
    mylist.append(a)
    return mylist
def f2(a):
    global mylist
    mylist.append(a)
    return mylist
def f3(a):
    global mylist
    mylist.append(a)
    return mylist
if __name__ == '__main__':
    pool = Pool(multiprocessing.cpu_count())
    freeze_support()
    r1 = pool.apply_async(f1,[1])
    r2 = pool.apply_async(f2,[2])
    r3 = pool.apply_async(f3,[3])
    r4 = pool.apply_async(f1,[4])
    r5 = pool.apply_async(f2,[5])
    r6 = pool.apply_async(f3,[6])
    a1 = r1.get(timeout=234)
    a2 = r2.get(timeout=234)
    a3 = r3.get(timeout=234)
    a4 = r4.get(timeout=234)
    a5 = r5.get(timeout=234)
    a6 = r6.get(timeout=234)
    print(mylist)

The output is:

[]

I analyze when you create multiprocessing, then memory is divided into completely separated/independent pieces or segments. And there is no shared memory between the processes. That is a fact with multiprocessing feature of Python. The main process declares a global variable, then the other three subprocesses would define three other global variables for their own scopes. However, the programmer sees one global variable.

But, no worries, here is a solution for the above-mentioned example:

from multiprocessing import Pool, freeze_support
import multiprocessing
from multiprocessing import Process,Manager
def f1(a):
    l = []
    l.append(a)
    return l
def f2(a):
    l = []
    l.append(a)
    return l
def f3(a):
    l = []
    l.append(a)
    return l
if __name__ == '__main__':
    pool = Pool(multiprocessing.cpu_count())
    freeze_support()
    r1 = pool.apply_async(f1,[1])
    r2 = pool.apply_async(f2,[2])
    r3 = pool.apply_async(f3,[3])
    r4 = pool.apply_async(f1,[4])
    r5 = pool.apply_async(f2,[5])
    r6 = pool.apply_async(f3,[6])
    a1 = r1.get(timeout=234)
    a2 = r2.get(timeout=234)
    a3 = r3.get(timeout=234)
    a4 = r4.get(timeout=234)
    a5 = r5.get(timeout=234)
    a6 = r6.get(timeout=234)
    mylist = a1 + a2 + a3 + a4 + a5 + a6
    print(mylist)

The output is:

[1, 2, 3, 4, 5, 6]

The solution is that you need to wrap up the output of every parallel processing functions.

Actually, that is the only strategy that I found working with Python and when I tried the other methods such as multiprocessing Manager, but it did not work properly.

Upvotes: 0

Booboo
Booboo

Reputation: 44108

TL;DR. You can skip to the last paragraph for the solution or read everything to understand what is actually going on.

You did not tag your question with your platform (e.g. windows or linux) as the guidelines for posting questions tagged with multiprocessing requests that you do; the behavior ("behaviour" for Anglos) of global variables very much depends on the platform.

On platforms that use method spawn to create new processes, such as Windows, to create and initialize each processes in the pool that is created with your pool = multiprocessing.Pool() statement, a new, empty address space is created and a new Python interpreter is launched that re-reads and re-executes the source program in order to initialize the address space before ultimately calling the worker function test. That means that every statement at global scope, i.e. import statements, variable declarations, function declarations, etc., are executed for this purpose. However, in the new subprocess variable __name__ will not be "__main__" so any statements within the if __name__ == "__main__" : block will not be executed. That is why for Windows platforms you must put code that creates new processes within such a block. Failure to do so would result in an infinite recursive process-creation loop if it were to go otherwise undetected.

So if you are running under Windows, your main process has set globVar to 'not ok' just prior to creating the pool. But when the processes are initialized prior to calling test, your source is re-executed and each process, which runs in its own address space and therefore has its own copy of globVar re-initialized that variable back to 'ok'. That is the value that test will see and the previous statement implies that modifying that local copy of globVar will not be reflected back to the main process.

Now on platforms that use fork to create new processes, such as Linux, things are a bit different. When the subprocesses are created, each one inherits the address space of the parent process as read-only and only when it attempts to modify memory does it get a copy ("copy on write"). This is clearly a more efficient process-creating mechanism. So in this case test will see globVar having a value of 'not ok' because that was the value it had at the time the subprocesses were created. But if test updates globVar, the "copy on write" mechanism will ensure that it is updating a globVar that exists in a local address space. So again the main process will not see the updated value.

So having worker functions returning values as your test function is doing is a standard way of reflecting back to the main process results. Your problem is that you are not starting with a globVar value that you expected. This can be solved by initializing the pool's processes with the correct globVar value using the initializer and initargs arguments to the Pool constructor (see the documentation):

import multiprocessing

global globVar
globVar = 'ok'

def init_processes(gVar):
    global globVar
    globVar = gVar

def test(arg1):
    print(arg1)
    return globVar

if __name__ == "__main__" :
    globVar = 'not ok'

    #Sequential
    print(test(0))

    #Parallel
    pool = multiprocessing.Pool(initializer=init_processes, initargs=(globVar,))
    argList = [0,1,2]
    result = pool.map(test,argList)
    pool.close()
    print(result)

Prints:

0
not ok
0
1
2
['not ok', 'not ok', 'not ok']

Upvotes: 2

Related Questions