James Steele
James Steele

Reputation: 654

Python multiprocessing with dataframe and multiple arguments

According to this answer when multiprocessing with multiple arguments starmap should be used. The problem I am having is that one of my arguments is a constant dataframe. When I create a list of arguments to be used by my function and starmap the dataframe gets stored over and over. I though I could get around this problem using namespace, but can't seem to figure it out. My code below hasn't thrown an error, but after 30 minutes no files have written. The code runs in under 10 minutes without using multiprocessing and just calling write_file directly.

import pandas as pd
import numpy as np
import multiprocessing as mp

def write_file(df, colIndex, splitter, outpath):
    with open(outpath + splitter + ".txt", 'a') as oFile:
        data = df[df.iloc[:,colIndex] == splitter]
        data.to_csv(oFile, sep = '|', index = False, header = False)

mgr = mp.Manager()
ns = mgr.Namespace()
df = pd.read_table(file_, delimiter = '|', header = None)
ns.df = df.iloc[:,1] = df.iloc[:,1].astype(str)
fileList = list(df.iloc[:, 1].astype('str').unique())
for item in fileList:
    with mp.Pool(processes=3) as pool:
        pool.starmap(write_file, np.array((ns, 1, item, outpath)).tolist())

Upvotes: 2

Views: 2393

Answers (2)

Vivek
Vivek

Reputation: 1

I had the same issue - needed to pass two existing dataframes to the function using starmap. It turns out that there isn't a need to declare a dataframe as an argument in the function at all. You could just call the dataframe using 'global', as described in the accepted answer here: Pandas: local vs global dataframe in functions

Upvotes: 0

James Steele
James Steele

Reputation: 654

To anyone else struggling with this issue, my solution was to create an iterable list of tuples of length chunksize out of the dataframe via:

iterable = product(np.array_split(data, 15), [args])

Then, pass this iterable to the starmap:

pool.starmap(func, iterable)

Upvotes: 1

Related Questions