Parallelizing modifications to a dictionary

Question

I have a dictionary my_dict containing lists, and an iterable keys with a lot of keys which I would like to run a function on:

for key in keys:
    if key in my_dict:
        my_dict[key].append(my_fun(key, params))
    else:
        my_dict[key] = [my_fun(key, params)]

my_fun is slow. How do I parallellize this loop?

Is it just:

import multiprocessing

def _process_key(key): 
    if key in my_dict:
        my_dict[key].append(my_fun(key, params))
    else:
        my_dict[key] = [my_fun(key, params)]

if __name__ == '__main__':
with Pool(5) as p:
    p.map(_process_key, keys)

tdelaney · Accepted Answer

The dict is in the parent memory space so you need to update it there. pool.map iterates through whatever is returned by the worker function, so just have it return it in a useful form. collections.defaultdict is a helper that creates items for you, so you can

import multiprocessing
import collections

def _process_key(key): 
    return key, my_fun(key, params)

if __name__ == '__main__':
    with Pool(5) as p:
        my_dict = collections.defaultdict(list)
        for key, val in p.map(_process_key, keys):
            my_dict[key].append(val)

Parallelizing modifications to a dictionary

Answers (2)

Related Questions