thposs
thposs

Reputation: 328

Why does my Python multiprocessing result not append on callback?

I can't seem to figure out why my results are not appending while using the multiprocessing package.

I've looked at many similar questions but can't seem to figure out what I'm doing wrong. This my first attempt at multiprocessing (as you might be able to tell) so I don't quite understand all the jargon in the documentation which might be part of the problem

Running this in PyCharm prints an empty list instead of the desired list of row sums.

import numpy as np
from multiprocessing import Pool
import timeit

data = np.random.randint(0, 100, size=(5, 1000))


def add_these(numbers_to_add):
    added = np.sum(numbers_to_add)
    return added


results = []
tic = timeit.default_timer()  # start timer

pool = Pool(3)
if __name__ == '__main__':
    for row in data:
        pool.apply_async(add_these, row, callback=results.append)


toc = timeit.default_timer()  # start timer
print(toc - tic)
print(results)  

EDIT: Closing and joining pool, then printing results within the if name==main block results in the following error being raised repeatedly until I manually stop execution:

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.

Code to reproduce error:

import numpy as np
from multiprocessing import Pool, freeze_support
import timeit

data = np.random.randint(0, 100, size=(5, 1000))


def add_these(numbers_to_add):
    added = np.sum(numbers_to_add)
    return added


results = []
tic = timeit.default_timer()  # start timer

pool = Pool(3)
if __name__ == '__main__':
    for row in data:
        pool.apply_async(add_these, (row,), callback=results.append)
    pool.close()
    pool.join()
    print(results)


toc = timeit.default_timer()  # end timer
print(toc - tic)

Upvotes: 1

Views: 402

Answers (1)

WMRamadan
WMRamadan

Reputation: 1210

I think this would be a more correct way:

import numpy as np
from multiprocessing import Pool
import timeit

data = np.random.randint(0, 100, size=(5, 1000))


def add_these(numbers_to_add):
    added = np.sum(numbers_to_add)
    return added


results = []

if __name__ == '__main__':
    with Pool(processes=3) as pool:
        for row in data:
            results = pool.apply_async(add_these, (row,))
            try:
                print(results.get(timeout=1))
            except TimeoutError:
                print("Multiprocessing Timeout")

Upvotes: 1

Related Questions