Chris
Chris

Reputation: 31266

Multiprocessing scope: list not updating using 'multiprocessing.Process', worked using 'threading.Thread'

I have a situation in multiprocessing where the list I use to collect the results from my function is not getting updated by the process. I have two examples of code, one which updates the list correction: the code updated properly using 'Thread', but fails when using 'Process', and one which does not. I cannot detect any kind of error. I think this might be a subtlety of scope that I don't understand.


Here is the working example: correction: this example does not work either; works with threading.Thread, however.

def run_knn_result_wrapper(dataset,k_value,metric,results_list,index):
    results_list[index] = knn_result(dataset,k_value,metric)

results = [None] * (k_upper-k_lower)
threads = [None] * (k_upper-k_lower)
joined = [0] * (k_upper-k_lower)

for i in range(len(threads)):
    threads[i] = Process(target=run_knn_result_wrapper,args=(dataset,k_lower+i,metric,results,i))
    threads[i].start()
    if batch_size == 1:
        threads[i].join()
        joined[i]=1
    else:

        if i % batch_size == batch_size-1 and i > 0:
            for j in range(max(0,i - 2),i):
                if joined[j] == 0:
                    threads[j].join()
                    joined[j] = 1
for i in range(len(threads)):
    if joined[i] == 0:
        threads[i].join()


Ignoring the "threads" variable name (this started on threading, but then I found out about the GIL), the `results` list updates perfectly.  

Here is the code which does not update the results list:

def prediction_on_batch_wrapper(batchX,results_list,index):
        results_list[index] = prediction_on_batch(batchX)



batches_of_X = np.array_split(X,10)

overall_predicted_classes_list = []
for i in range(len(batches_of_X)):
    batches_of_X_subsets = np.array_split(batches_of_X[i],10)
    processes = [None]*len(batches_of_X_subsets)
    results_list = [None]*len(batches_of_X_subsets)
    for j in range(len(batches_of_X_subsets)):
        processes[j] = Process(target=prediction_on_batch_wrapper,args=(batches_of_X_subsets[j],results_list,j))
    for j in processes:
        j.start()
    for j in processes:
        j.join()
    if len(results_list) > 1:
        results_array = np.concatenate(tuple(results_list))
    else:
        results_array = results_list[0]

I cannot tell why, within Python's scope rules the results_list list does not get updated by the prediction_on_batch_wrapper function.

A debugging session reveals that the results_list value inside the prediction_on_batch_wrapper function does, in fact, get updated...but somehow, it's scope is local on this second python file, and global on the first...


What is going on here?

Upvotes: 3

Views: 841

Answers (1)

Matt Jordan
Matt Jordan

Reputation: 2181

This is because you are spawning another process - separate processes do not share any resources, and that includes memory.

Each process is a separate isolated running program, usually visible within Task Manager or ps. When you use Process to start an additional process, you should see a second instance of Python start when you spawn the process.

A thread is another execution point within your main process, and shares all of the resources of the main process even across multiple cores. All threads within a process are capable of seeing any part of the overall process, although how much they can use depends on the code that you write for the thread and the restrictions of the language in which you write them.

Using Process is like running two instances of your program; you can pass parameters to the new process, but those are copies that are no longer shared once they are passed. For example, if you modified the data within the main process, the new process wouldn't see the changes, since the two processes have completely separate copies of the data.

If you want to share data, you should really use threads rather than processes. For most multi-processing needs, threads are preferable to processes, except in the few cases where you need the strict separation.

Upvotes: 3

Related Questions