python multiprocessing over a mixed data list of lists and Shogun

Question

Im trying to parallelize (in some simple way) my Machine Learning code that originally uses Shogun Machine Learning toolbox. There are many possible configurations for training, so sequential processing is not a suitable approach. I have a learning machine object named mkl_object whose parameters would be updated according to a grid parameter path list (paths) generated by a path generator I programmed, which is called gridObj.generateRandomGridPaths(). I'd like having a multiprocessing setting such that mkl_object learns a model for each path. That is, e.g., three models corresponding to a list of three paths: paths = [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]], would learn three models in separated cores each. See my code and its erroneous output below:

from multiprocessing import Pool
#from functools import partial # I already tried with partial and parmap
#import parmap as par
# My Machine learning and random grid search modules:
from mklObj import *
from gridObj import *
# The input training and test data subsets are ShogunFeature objects
[feats_train,
feats_test,
labelsTr,
labelsTs] = load_multiclassToy('../shogun-data/toy/',# Directory
         'train_multiclass.dat',# Sample dataSet file name
         'label_multiclass.dat')# Multi-class Labels file name

mkl_object = mklObj() # Learning machine global instantiation
#Function for mapping:    
def mkPool(path): # path: a list of learning parameters
    global feats_train # Train and test data produced above
    global labelsTr
    global feats_test
    global labelsTs

    global mkl_object 

    if path[0][0] is 'gaussian':
        a = 2*path[0][1][0]**2
        b = 2*path[0][1][1]**2
    else:
        a = path[0][1][0]
        b = path[0][1][1]
    # Setting each listelement (paths[i]) as learning parameter:
    mkl_object.mklC = path[5]
    mkl_object.weightRegNorm = path[4]
    mkl_object.fit_kernel(featsTr=feats_train,
                   targetsTr=labelsTr,
                   featsTs=feats_test,
                   targetsTs= labelsTs,
                   kernelFamily=path[0][0],
                   randomRange=[a, b],            
                   randomParams=[(a + b)/2, 1.0],  
                   hyper=path[3],       
                   pKers=path[2])
    # Returns the test error:
    return mkl_object.testerr

if __name__ == '__main__':

    p = Pool(3)
#### Loading the experimentation grid of parameters.
    grid = gridObj(file = 'gridParameterDic.txt')
    paths = grid.generateRandomGridPaths(trials = 3)
    print 'See the path list: ', paths
    [a, b, c] = paths
    # I already made tests with passing 'paths' and '[paths]' and the error is the same.
    print p.map(mkPool, [a, b, c])

See below the output error:

/usr/bin/python2.7 /home/.../mklCall.py
See the path list: [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
Traceback (most recent call last):
The entered hyperparameter distribution is not allowed: weibull
  File "../mklCall.py", line 76, in 
The entered hyperparameter distribution is not allowed: linear
    print p.map(mkPool, [a, b, c])
The entered hyperparameter distribution is not allowed: triangular
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
TypeError: 'NoneType' object is not iterable

Process finished with exit code 1

The above costum exception shouldn't taking place because weibull (and others appearing) is a valid string (input parameter). Thus it seems that there are unknown origin disorders while execution. This error repeats as len(paths).

If I run the training for a single path, without using Pool.map(), there are not errors.

I also ran the code in linear form for some paths and there were not errors:

acc = []   
for path in paths:
    print 'A path: ', path
    acc.append(mkPool(path))
    print 'Accuracy: ', acc[-1]

I followed the python documentation https://docs.python.org/2/library/multiprocessing.html . Suggetions, examples or possible solutions will be very appreciated.

Thank you in advance.

python multiprocessing over a mixed data list of lists and Shogun

Answers (0)

Related Questions