Reputation: 303
I want to be able to use the Values module from the multiprocessing library to be able to keep track of data. As far as I know, when it comes to multiprocessing in Python, each process has it's own copy, so I can't edit global variables. I want to be able to use Values to solve this issue. Does anyone know how I can pass Values data into a pooled function?
from multiprocessing import Pool, Value
import itertools
arr = [2,6,8,7,4,2,5,6,2,4,7,8,5,2,7,4,2,5,6,2,4,7,8,5,2,9,3,2,0,1,5,7,2,8,9,3,2,]
def hello(g, data):
data.value += 1
if __name__ == '__main__':
data = Value('i', 0)
func = partial(hello, data)
p = Pool(processes=1)
p.map(hello,itertools.izip(arr,itertools.repeat(data)))
print data.value
Here is the runtime error i'm getting:
RuntimeError: Synchronized objects should only be shared between processes through inheritance
Does anyone know what I'm doing wrong?
Upvotes: 9
Views: 7314
Reputation: 43495
There is little need to use Values
with Pool.map()
.
The central idea of a map
is to apply a function to every item in a list or other iterator, gathering the return values in a list.
The idea behind Pool.map
is basically the same but then spread out over multiple processes. In every worker process, the mapped function gets called with items from the iterator.
The return values from the functions called in the worker processes are transported back to the parent process and gathered in a list which is eventually returned.
Alternatively, you could use Pool.imap_unordered
, which starts returning results as soon as they are available instead of waiting until everything is finished. So you could tally the amount of returned results and use that to update the progress bar.
Upvotes: -1
Reputation: 6190
I don't know why, but there seems to be some issue using the Pool
that you don't get if creating subprocesses manually. E.g. The following works:
from multiprocessing import Process, Value
arr = [1,2,3,4,5,6,7,8,9]
def hello(data, g):
with data.get_lock():
data.value += 1
print id(data), g, data.value
if __name__ == '__main__':
data = Value('i')
print id(data)
processes = []
for n in arr:
p = Process(target=hello, args=(data, n))
processes.append(p)
p.start()
for p in processes:
p.join()
print "sub process tasks completed"
print data.value
However, if you do basically the same think using Pool
, then you get an error "RuntimeError: Synchronized objects should only be shared between processes through inheritance". I have seen that error when using a pool before, and never fully got to the bottom of it.
An alternative to using Value
that seems to work with Pool
is to use a Manager to give you a 'shared' list:
from multiprocessing import Pool, Manager
from functools import partial
arr = [1,2,3,4,5,6,7,8,9]
def hello(data, g):
data[0] += 1
if __name__ == '__main__':
m = Manager()
data = m.list([0])
hello_data = partial(hello, data)
p = Pool(processes=5)
p.map(hello_data, arr)
print data[0]
Upvotes: 11