Reputation: 337
I am attempting to train a cross validated SVM model (for a school project). Given X
and y
, when I call
clf = svm.SVC(gamma='scale')
scores = cross_val_score(clf, X, y, cv=4)
scores
is set to an array as expected, but I want to be able to call clf.predict(test_x)
but when I do it throws an exception with the message This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
(I wish it would return something like [scores, predictor]
or maybe a CrossValidationPredictor
that has a predict
method, but that is not the case.)
Of course, I can call classifier = clf.fit(X, y)
but that doesn't give me a cross validated SVM predictor, how do I get a cross validated predictor that I can use to—you know—predict?
Upvotes: 3
Views: 132
Reputation: 6260
Maybe you can have a look a grid-search:
Grid-search
scikit-learn provides an object that, given data, computes the score during the fit of an estimator on a parameter grid and chooses the parameters to maximize the cross-validation score. This object takes an estimator during the construction and exposes an estimator API
Example:
>>> from sklearn.model_selection import GridSearchCV, cross_val_score
>>> Cs = np.logspace(-6, -1, 10)
>>> clf = GridSearchCV(estimator=svc, param_grid=dict(C=Cs),
... n_jobs=-1)
>>> clf.fit(X_digits[:1000], y_digits[:1000])
GridSearchCV(cv=None,...
>>> clf.best_score_
0.925...
>>> clf.best_estimator_.C
0.0077...
>>> # Prediction performance on test set is not as good as on train set
>>> clf.score(X_digits[1000:], y_digits[1000:])
Here is the site for checking it: https://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html
Upvotes: 1
Reputation: 23637
Of course, I can call
classifier = clf.fit(X, y)
but that doesn't give me a cross validated SVM predictor, how do I get a cross validated predictor that I can use to—you know—predict?
clf.fit(X, y)
is exactly what you should do.
There is no such thing as a cross validated predictor because cross-validation is not a method for training a predictor but, well, for validating a type of predictor. Let me quote the Wikipedia entry:
Cross-validation [...] is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.
(Statistical analysis, here, includes prediction models such as regressors or classifiers.)
The question that cross validation answers is "How well will my classifier perform later when I apply it to data I don't have yet?". Usually you try to cross validate different classifiers or hyperparameters and then select the one with the highest score, which is the one that is expected to generalize best to unseen data.
Finally you train the classifier on the full data set, because you want to deploy the best possible classifer.
Upvotes: 2