Reputation: 832
Im trying to parallelize (in some simple way) my Machine Learning code that originally uses Shogun Machine Learning toolbox. There are many possible configurations for training, so sequential processing is not a suitable approach. I have a learning machine object named mkl_object
whose parameters would be updated according to a grid parameter path
list (paths
) generated by a path generator I programmed, which is called gridObj.generateRandomGridPaths()
. I'd like having a multiprocessing setting such that mkl_object
learns a model for each path. That is, e.g., three models corresponding to a list of three paths: paths = [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
, would learn three models in separated cores each. See my code and its erroneous output below:
from multiprocessing import Pool
#from functools import partial # I already tried with partial and parmap
#import parmap as par
# My Machine learning and random grid search modules:
from mklObj import *
from gridObj import *
# The input training and test data subsets are ShogunFeature objects
[feats_train,
feats_test,
labelsTr,
labelsTs] = load_multiclassToy('../shogun-data/toy/',# Directory
'train_multiclass.dat',# Sample dataSet file name
'label_multiclass.dat')# Multi-class Labels file name
mkl_object = mklObj() # Learning machine global instantiation
#Function for mapping:
def mkPool(path): # path: a list of learning parameters
global feats_train # Train and test data produced above
global labelsTr
global feats_test
global labelsTs
global mkl_object
if path[0][0] is 'gaussian':
a = 2*path[0][1][0]**2
b = 2*path[0][1][1]**2
else:
a = path[0][1][0]
b = path[0][1][1]
# Setting each listelement (paths[i]) as learning parameter:
mkl_object.mklC = path[5]
mkl_object.weightRegNorm = path[4]
mkl_object.fit_kernel(featsTr=feats_train,
targetsTr=labelsTr,
featsTs=feats_test,
targetsTs= labelsTs,
kernelFamily=path[0][0],
randomRange=[a, b],
randomParams=[(a + b)/2, 1.0],
hyper=path[3],
pKers=path[2])
# Returns the test error:
return mkl_object.testerr
if __name__ == '__main__':
p = Pool(3)
#### Loading the experimentation grid of parameters.
grid = gridObj(file = 'gridParameterDic.txt')
paths = grid.generateRandomGridPaths(trials = 3)
print 'See the path list: ', paths
[a, b, c] = paths
# I already made tests with passing 'paths' and '[paths]' and the error is the same.
print p.map(mkPool, [a, b, c])
See below the output error:
/usr/bin/python2.7 /home/.../mklCall.py
See the path list: [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
Traceback (most recent call last):
The entered hyperparameter distribution is not allowed: weibull
File "../mklCall.py", line 76, in <module>
The entered hyperparameter distribution is not allowed: linear
print p.map(mkPool, [a, b, c])
The entered hyperparameter distribution is not allowed: triangular
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
TypeError: 'NoneType' object is not iterable
Process finished with exit code 1
The above costum exception shouldn't taking place because weibull
(and others appearing) is a valid string (input parameter). Thus it seems that there are unknown origin disorders while execution. This error repeats as len(paths)
.
If I run the training for a single path, without using Pool.map()
, there are not errors.
I also ran the code in linear form for some paths and there were not errors:
acc = []
for path in paths:
print 'A path: ', path
acc.append(mkPool(path))
print 'Accuracy: ', acc[-1]
I followed the python documentation https://docs.python.org/2/library/multiprocessing.html . Suggetions, examples or possible solutions will be very appreciated.
Thank you in advance.
Upvotes: 2
Views: 371