Reputation: 198
In Python 3.5 on Windows, I am attempting to design some multiprocessing code which requires some pre-processed variables to be available to the function that is applied to the input.
To make these variables available, I am treating them as global variables.
While this works in a non-parallel approach, using multiprocessing.Pool
shows behavior that would occur if the global had never been modified from its initialization.
Consider the following snippet:
from multiprocessing import Pool
testlist = []
def f(x):
return x*x + testlist[0]
def main():
global testlist
input_iter = range(10)
testlist = [1, 2, 3, 4, 5]
for i in input_iter:
print(f(i))
with Pool(2) as pool:
for i in pool.imap_unordered(f, input_iter):
print(i)
if __name__ == '__main__':
main()
The function f(x)
simply squares the input, and adds an element from the global variable testlist
. testlist
is defined globally first as an empty list, and is then modified to contain the list [1, 2, 3, 4, 5]
in the main()
function.
Running this code will produce the desired output for the simple for
loop, but the multiprocessing loop will throw an IndexError
: to the Pool
workers, the testlist
variable has not been modified to contain values and is still an empty list.
1
2
5
10
17
26
37
50
65
82
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "progresstest.py", line 7, in f
return x*x + testlist[0]
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "progresstest.py", line 21, in <module>
main()
File "progresstest.py", line 17, in main
for i in pool.imap_unordered(f, input_iter):
File "\lib\multiprocessing\pool.py", line 695, in next
raise value
IndexError: list index out of range
The global variable is modified prior to any creation of the Pool
workers, and the simple loop shows that this assignment worked: no IndexError
is thrown in the for
loop. I understand that state cannot be shared between processes, but this variable is defined prior to the parallel execution and does not change.
Why is this happening?
Upvotes: 2
Views: 1020
Reputation: 133
I'd wrap the values you want to pass in tuples (as Roland suggests) or in a list which may offer more flexible handling. Here is an example of multiprocessing with a nested list:
import multiprocessing
import os
from time import sleep
def foo(transptbox):
[myvalues,a] = transptbox
for j in myvalues:
val1, val2, val3, val4, val5 = j
a = a*a + val1 # stands for your process with iterable a and value 1
sleep(0.1) # using sleep to simulate processing time long enough to activate multiprocessing
print(f"Process-ID {os.getpid()}:{a} : {val1} {val2} {val3} {val4} {val5}") # just to show values are available
return a
if __name__ == '__main__':
values_to_be_sent = [[1,2,3,4,5]]
input_iter = range(10)
tbox =[[0 for y in range(2)]for x in range(len(input_iter))]
for x in range(len(input_iter)):
tbox[x] = [values_to_be_sent, input_iter[x]]
pool = multiprocessing.Pool(processes = 3)
result = list(pool.map(foo, tbox))
print(result)
The output is the result you wanted to have:
C:\Users\757\PycharmProjects\exu\venv\Scripts\python.exe C:/Users/757/.PyCharmCE2018.1/config/scratches/scratch_9.py
Process-ID 3476:1 : 1 2 3 4 5
Process-ID 4416:5 : 1 2 3 4 5
Process-ID 5568:2 : 1 2 3 4 5
Process-ID 3476:10 : 1 2 3 4 5
Process-ID 4416:17 : 1 2 3 4 5
Process-ID 5568:26 : 1 2 3 4 5
Process-ID 3476:37 : 1 2 3 4 5
Process-ID 5568:50 : 1 2 3 4 5
Process-ID 4416:65 : 1 2 3 4 5
Process-ID 3476:82 : 1 2 3 4 5
[1, 2, 5, 10, 17, 26, 37, 50, 65, 82]
Process finished with exit code 0
Upvotes: 0
Reputation: 43495
On UNIX-like operating systems, multiprocessing
(by default) uses the fork
system call in creating the Pool
to make one or more exact copies of the master process.
On ms-windows, there is no fork
system call, so multiprocessing
works different. It starts a new Python process that imports the original program as a module. In this case, main()
is not called, so your global isn't updated.
Upvotes: 2