Lostsoul
Lostsoul

Reputation: 25991

How can I access a shared dictionary with multiprocessing?

I think I am following the python documentation correctly but I am having trouble getting the result I am looking for. I basically have a list of numbers, that are being passed to a function of nested for loops and the output is saved in a dictionary.

Here's the code:

from multiprocessing import Pool, Manager

list = [1,2,3,10]
dictionary = {}
def test(x, dictionary):
    for xx in range(100):
        for xxx in range(100):
            dictionary[x]=xx*xxx



if __name__ == '__main__':
    pool = Pool(processes=4)
    mgr = Manager()
    d = mgr.dict()
    for N in list:
        pool.apply_async(test, (N, d))

    # Mark pool as closed -- no more tasks can be added.
    pool.close()

    # Wait for tasks to exit
    pool.join()

    # Output results
    print d

Here's the expected result:

{1: 9801, 2: 9801, 3: 9801, 10: 9801}

Any suggestions of what I'm doing wrong? Also, I haven't convinced myself that shared resources are the best approach(thinking of using a database to maintain state) so if my approach is completely flawed or there's a better way to do this in python please let me know.

Upvotes: 6

Views: 1792

Answers (1)

Eli Bendersky
Eli Bendersky

Reputation: 273366

Change the definition of test to:

def test(x, d):
    for xx in range(100):
        for xxx in range(100):
            d[x]=xx*xxx

Otherwise you're just incrementing some global dictionary (without synchronization) and never access it later.


As for the general approach, I think this one in particular has a lot of contention on the shared dictionary. Do you really have to update it from each process as soon as that? Accumulating batches of partial results in each process and just updating the shared object once in a while should perform better.

Upvotes: 3

Related Questions