Talha Anwar
Talha Anwar

Reputation: 2959

Concatenate the output of multiprocessing function using python

I have dataset with 64 column, what i want is to find 30 feature for each column. I have 12 processor system and I want is to split the column across these processor. For example processor 1 find features of 1-8 column, processor 2 find features of 8-16 columns and so on. At the end what i want is to concatenate the output of each subprocess.

def process(i,subject):
    X=fe.fit_transform(subject[:,i:i+8])



if __name__ == '__main__':
    subject=np.load('data.npy')
    process_list=[]
    for i,j in enumerate(range(0,56,8)):   
        process_list.append(Process(target = process,args = (i,subject)))
        process_list[i].start()
        process_list[i].join()

what is want is to concatenate the output X from function process. In simple way we can append the X in a list and then concatenate it. But i am confused how to do it. Do it need to append inside function or below if __name__ == '__main__':

Other Way

The other way i am trying is using pool. Here is the approach

def cal_feature(subject):
    return fe.fit_transform(subject)

if __name__ == '__main__':
   subject=np.load('data.npy')

    p=Pool()
    result=p.map(cal_feature,subject)
    p.close()
    p.join()

In this appraoch i am unable to understand. what things are bing shared across process. Do feature split for processor, or column split for processors. By feature split, i mean processor-1 take 5 out of 30 feature for all 64 column, processor-2 take next 5 feature for 64 columns.
Or processor-1 take 1-8 columns for all features and processor-2 next 8 columns.
Second approach is giving me this error
IndexError: too many indices for array.

EDIT

import numpy as np
data= np.random.randint(0, 100, size=(30, 10, 20))
def cal_feature(subject):
    return np.mean((subject),-1)    
result=cal_feature(data)    
print(result)

This is simplified version of my work. Instead of just mean feature, there are other features also, which are calculate by an other function.
Talking about above simplified example, axis 0 shows trials, axis 1 show columns, axis 2 show data points. cal_feature calculate mean of each trial. This way we get a result having shape of (30,10).
Suppose i have 2 processors and what i want is to calculate mean of first 5 columns and all 30 trials by one processor, resulting a shape of 30,5. and processor 2 calcualte mean of next 5 column, resulting in shape of 30,5. Concate them and get a final result of 30,10 shape

Upvotes: 0

Views: 1167

Answers (1)

kederrac
kederrac

Reputation: 17322

you can try:

p = Pool(20) # your max workers = 2 * num cpu cores 
result = p.map(cal_feature, np.split(subject,20))

Upvotes: 1

Related Questions