Reputation:
I am trying to follow this tutorial to learn the machine learning based prediction but I have got two questions on it?
Ques1. How to set the n_estimators
in the below piece of code, otherwise it will always assume the default value.
from sklearn.cross_validation import KFold
def run_cv(X,y,clf_class,**kwargs):
# Construct a kfolds object
kf = KFold(len(y),n_folds=5,shuffle=True)
y_pred = y.copy()
# Iterate through folds
for train_index, test_index in kf:
X_train, X_test = X[train_index], X[test_index]
y_train = y[train_index]
# Initialize a classifier with key word arguments
clf = clf_class(**kwargs)
clf.fit(X_train,y_train)
y_pred[test_index] = clf.predict(X_test)
return y_pred
It is being called as:
from sklearn.svm import SVC
print "%.3f" % accuracy(y, run_cv(X,y,SVC))
Ques2: How to use the already trained model file (e.g. obtained from SVM) so that I can use it to predict more (test) data which I didn't used for training?
Upvotes: 3
Views: 1880
Reputation: 36545
For your first question, in the above code you would call run_cv(X,y,SVC,n_classifiers=100)
, the **kwargs
will pass this to the classifier initializer with the step clf = clf_class(**kwargs)
.
For your second question, the cross validation in the code you've linked is just for model evaluation, i.e. comparing different types of models and hyperparameters, and determining the likely effectiveness of your model in production. Once you've decided on your model, you need to refit the model on the whole dataset:
clf.fit(X,y)
Then you can get predictions with clf.predict
or clf.predict_proba
.
Upvotes: 2