alexyichu
alexyichu

Reputation: 3642

Faster Computing Time with Python and Sklearn

I'm doing a Thesis on model assessment techniques for machine learning classification tasks, I'm using some sklearn models, because I can write generic code for the most part, as I have lots of different datasets. One part of Sklearns model output is predict_proba in which it probability estimates. For large datasets with lots of datapoints, to compute the predict_proba for each datapoint takes a long time. I loaded up htop and saw python only using a single core for the calculations, so I wrote out the following function:

from joblib import Parallel, delayed
import multiprocessing
num_cores = multiprocessing.cpu_count()

def makeprob(r,first,p2,firstm):
    reshaped_r = first[r].reshape(1,p2)           
    probo = clf.predict_proba(reshaped_r)  
    probo = probo.max()                    
    print('Currently at %(perc)s percent' % {'perc': (r/firstm)*100})    
    return probo

# using multiple cores to run the function 'makeprob'
results = Parallel(n_jobs=num_cores)(delayed(makeprob)(r,first,p2,firstm) for r in range(firstm)) 

Now I see with htop all cores being used, and the speed up is significant, but not near as fast as I would like, if anybody knows of a way to speed this up or point me in the right direction as to get faster computation gains in this scenario that would be great.

Upvotes: 2

Views: 411

Answers (1)

SciPy
SciPy

Reputation: 6100

The loss of performance depends on three elements:

  1. Your python program: make sure that the datasets are well optimized to not overused RAM (i.e., make a subset with only the key variables that you need)
  2. The python environnment: If you run Sk-learn in ipython (Jupyter) Notebook , 'Multiprocessing' will not run as fast as in a python script. See iPython for parallel computing. A python script will be faster.
  3. Python library : Several Python libraries are natively designed to use all the resources of the computer. For example, with Tensorflow Tensorflow , the supported device types are CPU and GPU (and you can use several GPU).

Upvotes: 1

Related Questions