Reputation: 2959
I have dataset with 64 column, what i want is to find 30 feature for each column. I have 12 processor system and I want is to split the column across these processor. For example processor 1 find features of 1-8 column, processor 2 find features of 8-16 columns and so on. At the end what i want is to concatenate the output of each subprocess.
def process(i,subject):
X=fe.fit_transform(subject[:,i:i+8])
if __name__ == '__main__':
subject=np.load('data.npy')
process_list=[]
for i,j in enumerate(range(0,56,8)):
process_list.append(Process(target = process,args = (i,subject)))
process_list[i].start()
process_list[i].join()
what is want is to concatenate the output X from function process. In simple way we can append the X in a list and then concatenate it. But i am confused how to do it. Do it need to append inside function or below if __name__ == '__main__':
Other Way
The other way i am trying is using pool. Here is the approach
def cal_feature(subject):
return fe.fit_transform(subject)
if __name__ == '__main__':
subject=np.load('data.npy')
p=Pool()
result=p.map(cal_feature,subject)
p.close()
p.join()
In this appraoch i am unable to understand. what things are bing shared across process. Do feature split for processor, or column split for processors. By feature split, i mean processor-1 take 5 out of 30 feature for all 64 column, processor-2 take next 5 feature for 64 columns.
Or processor-1 take 1-8 columns for all features and processor-2 next 8 columns.
Second approach is giving me this error
IndexError: too many indices for array.
EDIT
import numpy as np
data= np.random.randint(0, 100, size=(30, 10, 20))
def cal_feature(subject):
return np.mean((subject),-1)
result=cal_feature(data)
print(result)
This is simplified version of my work. Instead of just mean feature, there are other features also, which are calculate by an other function.
Talking about above simplified example, axis 0 shows trials, axis 1 show columns, axis 2 show data points. cal_feature
calculate mean of each trial. This way we get a result having shape of (30,10)
.
Suppose i have 2 processors and what i want is to calculate mean of first 5 columns and all 30 trials by one processor, resulting a shape of 30,5
. and processor 2 calcualte mean of next 5 column, resulting in shape of 30,5
. Concate them and get a final result of 30,10
shape
Upvotes: 0
Views: 1167
Reputation: 17322
you can try:
p = Pool(20) # your max workers = 2 * num cpu cores
result = p.map(cal_feature, np.split(subject,20))
Upvotes: 1