Setting the n_estimators argument using **kwargs (Scikit Learn)

Question

I am trying to follow this tutorial to learn the machine learning based prediction but I have got two questions on it?

Ques1. How to set the n_estimators in the below piece of code, otherwise it will always assume the default value.

from sklearn.cross_validation import KFold

def run_cv(X,y,clf_class,**kwargs):
# Construct a kfolds object
kf = KFold(len(y),n_folds=5,shuffle=True)
y_pred = y.copy()

# Iterate through folds
for train_index, test_index in kf:
    X_train, X_test = X[train_index], X[test_index]
    y_train = y[train_index]
    # Initialize a classifier with key word arguments
    clf = clf_class(**kwargs)
    clf.fit(X_train,y_train)
    y_pred[test_index] = clf.predict(X_test)
return y_pred

It is being called as:

from sklearn.svm import SVC print "%.3f" % accuracy(y, run_cv(X,y,SVC))

Ques2: How to use the already trained model file (e.g. obtained from SVM) so that I can use it to predict more (test) data which I didn't used for training?

maxymoo · Accepted Answer

For your first question, in the above code you would call run_cv(X,y,SVC,n_classifiers=100), the **kwargs will pass this to the classifier initializer with the step clf = clf_class(**kwargs).

For your second question, the cross validation in the code you've linked is just for model evaluation, i.e. comparing different types of models and hyperparameters, and determining the likely effectiveness of your model in production. Once you've decided on your model, you need to refit the model on the whole dataset:

clf.fit(X,y)

Then you can get predictions with clf.predict or clf.predict_proba.

Setting the n_estimators argument using **kwargs (Scikit Learn)

Answers (1)

Related Questions