Parallel Processing using Multiprocessing in Python

Question

I'm new to doing parallel processing in Python. I have a large dataframe with names and the list of countries that the person lived in. A sample dataframe is this:

I have a chunk of code that takes in this dataframe and splits the countries to separate columns. The code is this:

def split_country(data):
    d_list = []
    for index, row in data.iterrows():
        for value in str(row['Country']).split(','):
            d_list.append({'Name':row['Name'], 
                       'value':value})
    data = data.append(d_list, ignore_index=True)
    data = data.groupby('Name')['value'].value_counts()
    data = data.unstack(level=-1).fillna(0)
    return (data)

The final output is something like this:

I'm trying to parallelize the above process by passing my dataframe (df) using the following:

import multiprocessing import Pool
result = []
pool = mp.Pool(mp.cpu_count())
result.append(pool.map(split_country, [row for row in df])

But the processing does not stop even with a toy dataset like the above. I'm completely new to this, so would appreciate any help

Parallel Processing using Multiprocessing in Python

Answers (1)

Related Questions