meTchaikovsky
meTchaikovsky

Reputation: 7676

Why multiprocessing.Pool cannot change global variable?

I want to use multiprocessing.Pool to load a large dataset, here is the code I'm using:

import os
from os import listdir
import pickle
from os.path import join
import multiprocessing as mp

db_path = db_path
the_files = listdir(db_path)
fp_dict = {}
def loader(the_hash):
        global fp_dict
        the_file = join(db_path, the_hash)
        with open(the_file, 'rb') as source:
                fp_dict[the_hash] = pickle.load(source)
        print(len(fp_dict))
def parallel(the_func, the_args):
        global fp_dict
        pool = mp.Pool(mp.cpu_count())
        pool.map(the_func, the_args)
        print(len(fp_dict))
parallel(loader, the_files)

Interestingly, the length of fp_dict is changing while the code is running. However, as long as the process terminates, the length of fp_dict is zero. Why? How can I modify a global variable by using multiprocessing.Pool?

Upvotes: 2

Views: 2575

Answers (1)

steveha
steveha

Reputation: 76725

Because you are using multiprocessing.Pool your program runs in multiple processes. Each process has its own copy of the global variable, each process modifies its own copy of the global variable, and when the work is finished each process is terminated. The master process never modified its copy of the global variable.

If you want to collect information about what happened inside each worker process, you should use the .map() method function, and return a tuple of data from each worker. Then have the master collect the tuples and put together a dictionary from the data.

Here's a YouTube tutorial that walks through using multiprocessing.Pool().map() to collect the output from a worker function.

https://www.youtube.com/watch?v=_1ZwkCY9wxk

Here's another answer I wrote for StackOverflow, showing how to pass tuples so the worker function can take multiple arguments; and showing how to return a tuple with multiple values from the worker function. It even make a dictionary out of the returned values.

https://stackoverflow.com/a/11025090/166949

Upvotes: 7

Related Questions