mmbb
mmbb

Reputation: 337

How to use classifiers used by cross_validation_scores

I am attempting to train a cross validated SVM model (for a school project). Given X and y, when I call

clf = svm.SVC(gamma='scale')
scores = cross_val_score(clf, X, y, cv=4)

scores is set to an array as expected, but I want to be able to call clf.predict(test_x) but when I do it throws an exception with the message This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this method. (I wish it would return something like [scores, predictor] or maybe a CrossValidationPredictor that has a predict method, but that is not the case.)

Of course, I can call classifier = clf.fit(X, y) but that doesn't give me a cross validated SVM predictor, how do I get a cross validated predictor that I can use to—you know—predict?

Upvotes: 3

Views: 132

Answers (2)

PV8
PV8

Reputation: 6260

Maybe you can have a look a grid-search:

Grid-search

scikit-learn provides an object that, given data, computes the score during the fit of an estimator on a parameter grid and chooses the parameters to maximize the cross-validation score. This object takes an estimator during the construction and exposes an estimator API

Example:

>>> from sklearn.model_selection import GridSearchCV, cross_val_score
>>> Cs = np.logspace(-6, -1, 10)
>>> clf = GridSearchCV(estimator=svc, param_grid=dict(C=Cs),
...                    n_jobs=-1)
>>> clf.fit(X_digits[:1000], y_digits[:1000])        
GridSearchCV(cv=None,...
>>> clf.best_score_                                  
0.925...
>>> clf.best_estimator_.C                            
0.0077...

>>> # Prediction performance on test set is not as good as on train set
>>> clf.score(X_digits[1000:], y_digits[1000:]) 

Here is the site for checking it: https://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html

Upvotes: 1

MB-F
MB-F

Reputation: 23637

Of course, I can call classifier = clf.fit(X, y) but that doesn't give me a cross validated SVM predictor, how do I get a cross validated predictor that I can use to—you know—predict?

clf.fit(X, y) is exactly what you should do.

There is no such thing as a cross validated predictor because cross-validation is not a method for training a predictor but, well, for validating a type of predictor. Let me quote the Wikipedia entry:

Cross-validation [...] is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.

(Statistical analysis, here, includes prediction models such as regressors or classifiers.)

The question that cross validation answers is "How well will my classifier perform later when I apply it to data I don't have yet?". Usually you try to cross validate different classifiers or hyperparameters and then select the one with the highest score, which is the one that is expected to generalize best to unseen data.

Finally you train the classifier on the full data set, because you want to deploy the best possible classifer.

Upvotes: 2

Related Questions