Librarian
Librarian

Reputation: 198

Python - Global variable modified prior to multiprocessing call is passed as original state

In Python 3.5 on Windows, I am attempting to design some multiprocessing code which requires some pre-processed variables to be available to the function that is applied to the input.
To make these variables available, I am treating them as global variables.

While this works in a non-parallel approach, using multiprocessing.Pool shows behavior that would occur if the global had never been modified from its initialization.

Consider the following snippet:

from multiprocessing import Pool

testlist = []


def f(x):
    return x*x + testlist[0]


def main():
    global testlist
    input_iter = range(10)
    testlist = [1, 2, 3, 4, 5]
    for i in input_iter:
        print(f(i))
    with Pool(2) as pool:
        for i in pool.imap_unordered(f, input_iter):
            print(i)

if __name__ == '__main__':
    main()

The function f(x) simply squares the input, and adds an element from the global variable testlist. testlist is defined globally first as an empty list, and is then modified to contain the list [1, 2, 3, 4, 5] in the main() function.

Running this code will produce the desired output for the simple for loop, but the multiprocessing loop will throw an IndexError: to the Pool workers, the testlist variable has not been modified to contain values and is still an empty list.

1
2
5
10
17
26
37
50
65
82
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "progresstest.py", line 7, in f
    return x*x + testlist[0]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "progresstest.py", line 21, in <module>
    main()
  File "progresstest.py", line 17, in main
    for i in pool.imap_unordered(f, input_iter):
  File "\lib\multiprocessing\pool.py", line 695, in next
    raise value
IndexError: list index out of range

The global variable is modified prior to any creation of the Pool workers, and the simple loop shows that this assignment worked: no IndexError is thrown in the for loop. I understand that state cannot be shared between processes, but this variable is defined prior to the parallel execution and does not change.

Why is this happening?

Upvotes: 2

Views: 1020

Answers (2)

Marc Steffen
Marc Steffen

Reputation: 133

I'd wrap the values you want to pass in tuples (as Roland suggests) or in a list which may offer more flexible handling. Here is an example of multiprocessing with a nested list:

import multiprocessing
import os
from time import sleep
def foo(transptbox):
    [myvalues,a] = transptbox
    for j in myvalues:
        val1, val2, val3, val4, val5 = j
    a = a*a + val1   # stands for your process with iterable a and value 1
    sleep(0.1) # using sleep to simulate processing time long enough to activate multiprocessing
    print(f"Process-ID {os.getpid()}:{a} : {val1} {val2} {val3} {val4} {val5}") # just to show values are available
    return a

if __name__ == '__main__':
    values_to_be_sent =  [[1,2,3,4,5]]
    input_iter = range(10)
    tbox =[[0 for y in range(2)]for x in range(len(input_iter))]
    for x in range(len(input_iter)):
        tbox[x] = [values_to_be_sent, input_iter[x]]
    pool = multiprocessing.Pool(processes = 3)
    result = list(pool.map(foo, tbox))
    print(result)

The output is the result you wanted to have:

C:\Users\757\PycharmProjects\exu\venv\Scripts\python.exe C:/Users/757/.PyCharmCE2018.1/config/scratches/scratch_9.py
Process-ID 3476:1 : 1 2 3 4 5
Process-ID 4416:5 : 1 2 3 4 5
Process-ID 5568:2 : 1 2 3 4 5
Process-ID 3476:10 : 1 2 3 4 5
Process-ID 4416:17 : 1 2 3 4 5
Process-ID 5568:26 : 1 2 3 4 5
Process-ID 3476:37 : 1 2 3 4 5
Process-ID 5568:50 : 1 2 3 4 5
Process-ID 4416:65 : 1 2 3 4 5
Process-ID 3476:82 : 1 2 3 4 5
[1, 2, 5, 10, 17, 26, 37, 50, 65, 82]

Process finished with exit code 0

Upvotes: 0

Roland Smith
Roland Smith

Reputation: 43495

On UNIX-like operating systems, multiprocessing (by default) uses the fork system call in creating the Pool to make one or more exact copies of the master process.

On ms-windows, there is no fork system call, so multiprocessing works different. It starts a new Python process that imports the original program as a module. In this case, main() is not called, so your global isn't updated.

Upvotes: 2

Related Questions