faster data fitting ( or learn) function in python scikit

Question

I am using scikit for my machine learning purposes . While I followed the steps exactly as mentioned in its official documentation but I encounter two problems. Here is the main part of the code :

1) trdata is training data created using sklearn.train_test_split. 2) ptest and ntest is test data of positives and negatives respectively

## Preprocessing

scaler = StandardScaler(); scaler.fit(trdata);

trdata = scaler.transform(trdata)
ptest = scaler.transform(ptest); ntest = scaler.transform(ntest)



## Building Classifier

# setting gamma and C for grid search optimization, RBF Kernel and SVM classifier

crange = 10.0**np.arange(-2,9); grange = 10.0**np.arange(-5,4)
pgrid = dict(gamma = grange, C = crange)
cv = StratifiedKFold(y = tg, n_folds = 3)

## Threshold Ranging

clf = GridSearchCV(SVC(),param_grid = pgrid, cv = cv, n_jobs = 8)


## Training Classifier: Semi Supervised Algorithm

clf.fit(trdata,tg,n_jobs=8)

Problem 1) When I use n_jobs = 8 in GridSearchCV, the code runs till GridSearchCV but hangs or say takes exceptionally long time without result in executing 'clf.fit' , even for a very small dataset. When I remove it then both execute but clf.fit takes very long time to converge for large datasets. My data size is 600 x 12 matrix for both positive and negatives. Can you tell me what exactly n_jobs will do and how it should be used? Also is there any faster fitting technique or modification in code that can be applied to make it faster ?

Problem 2) also StandardScaler should be used upon positive and negative data combined or separately for both ? I suppose it has to be used combined because then only we can use the scaler parameters upon the test sets.

lennon310 · Accepted Answer

SVC seems to be very sensitive to the data that is not normalized, you may try to normalize the data by:

from sklearn import preprocessing
trdata = preprocessing.scale(trdata)

faster data fitting ( or learn) function in python scikit

Answers (1)

Related Questions