user2313602
user2313602

Reputation: 303

How to use Values in a multiprocessing pool with Python

I want to be able to use the Values module from the multiprocessing library to be able to keep track of data. As far as I know, when it comes to multiprocessing in Python, each process has it's own copy, so I can't edit global variables. I want to be able to use Values to solve this issue. Does anyone know how I can pass Values data into a pooled function?

from multiprocessing import Pool, Value
import itertools

arr = [2,6,8,7,4,2,5,6,2,4,7,8,5,2,7,4,2,5,6,2,4,7,8,5,2,9,3,2,0,1,5,7,2,8,9,3,2,]

def hello(g, data):
    data.value += 1

if __name__ == '__main__':
    data = Value('i', 0)
    func = partial(hello, data)
    p = Pool(processes=1)
    p.map(hello,itertools.izip(arr,itertools.repeat(data)))

    print data.value

Here is the runtime error i'm getting:

RuntimeError: Synchronized objects should only be shared between processes through inheritance

Does anyone know what I'm doing wrong?

Upvotes: 9

Views: 7314

Answers (2)

Roland Smith
Roland Smith

Reputation: 43495

There is little need to use Values with Pool.map().

The central idea of a map is to apply a function to every item in a list or other iterator, gathering the return values in a list.

The idea behind Pool.map is basically the same but then spread out over multiple processes. In every worker process, the mapped function gets called with items from the iterator. The return values from the functions called in the worker processes are transported back to the parent process and gathered in a list which is eventually returned.


Alternatively, you could use Pool.imap_unordered, which starts returning results as soon as they are available instead of waiting until everything is finished. So you could tally the amount of returned results and use that to update the progress bar.

Upvotes: -1

Tom Dalton
Tom Dalton

Reputation: 6190

I don't know why, but there seems to be some issue using the Pool that you don't get if creating subprocesses manually. E.g. The following works:

from multiprocessing import Process, Value

arr = [1,2,3,4,5,6,7,8,9]


def hello(data, g):
    with data.get_lock():
        data.value += 1
    print id(data), g, data.value

if __name__ == '__main__':
    data = Value('i')
    print id(data)

    processes =  []
    for n in arr:
        p = Process(target=hello, args=(data, n))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    print "sub process tasks completed"
    print data.value

However, if you do basically the same think using Pool, then you get an error "RuntimeError: Synchronized objects should only be shared between processes through inheritance". I have seen that error when using a pool before, and never fully got to the bottom of it.

An alternative to using Value that seems to work with Pool is to use a Manager to give you a 'shared' list:

from multiprocessing import Pool, Manager
from functools import partial


arr = [1,2,3,4,5,6,7,8,9]


def hello(data, g):
    data[0] += 1


if __name__ == '__main__':
    m = Manager()
    data = m.list([0])
    hello_data = partial(hello, data)
    p = Pool(processes=5)
    p.map(hello_data, arr)

    print data[0]

Upvotes: 11

Related Questions