top bantz
top bantz

Reputation: 615

Multiprocessing with a function that returns a value, then appending that value

I've seen bundles of questions on here about multiprocessing, none of which seem to answer my specific problem.

Obviously my problem/function is much more complex, but I've tried to simplify it as much as possible. My function currently takes around 5 mins to run an instance of so would be great if I could get this working in parallel.

Below is basically how I am running things currently, looping thru each 'name' in my list, passing it thru the function and then appending it to the dictionary.

df = pd.DataFrame({'Matthew': [4, 9, 6], 'Mark': [2, 3, 5], 'Luke': [10, 1, 8], 'John': [20, 22, 21]})

def sum_funct(name,df):
    df = df
    return int(df[name].sum())

totals_dict = {}

names = ['Matthew', 'Mark', 'Luke', 'John']

for name in names:
    totals_dict[name] = sum_funct(name,df)

print(totals_dict)

{'Matthew': 19, 'Mark': 10, 'Luke': 19, 'John': 63}

What I'd love to be able to do is use multiprocessing in the for name in names: bit, but so far I can't find anything about being able to use the values a function returns while multiprocessing. I've come across some answers which touch on functions that return values but none have been of any help.

Upvotes: 0

Views: 86

Answers (1)

Adon Bilivit
Adon Bilivit

Reputation: 26993

The ProcessPoolExecutor from concurrent.futures could be used for this. For example:

from concurrent.futures import ProcessPoolExecutor
from pandas import DataFrame
from functools import partial

def sum_funct(df, name):
    return name, df[name].sum()

def main():
    dict_ = {'Matthew': [4, 9, 6], 'Mark': [2, 3, 5], 'Luke': [10, 1, 8], 'John': [20, 22, 21]}
    with ProcessPoolExecutor() as executor:
        total_dict = dict(executor.map(partial(sum_funct, DataFrame(dict_)), dict_))
        print(total_dict)

if __name__ == '__main__':
    main()

Output:

{'Matthew': 19, 'Mark': 10, 'Luke': 19, 'John': 63}

Note:

If the dictionary has more keys than you have CPUs you should probably consider calculating a suitable value for max_workers (passed to ProcessPoolExecutor)

Upvotes: 2

Related Questions