Reputation: 334
I am working on a project in which I am dealing with a large dataset.
I need to train the SVM classifier within the KFold cross-validation library from Sklearn.
import pandas as pd
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
x__df_chunk_synth = pd.read_csv('C:/Users/anujp/Desktop/sort/semester 4/ATML/Sem project/atml_proj/Data/x_train_syn.csv')
y_df_chunk_synth = pd.read_csv('C:/Users/anujp/Desktop/sort/semester 4/ATML/Sem project/atml_proj/Data/y_train_syn.csv')
svm_clf = svm.SVC(kernel='poly', gamma=1, class_weight=None, max_iter=20000, C = 100, tol=1e-5)
X = x__df_chunk_synth
Y = y_df_chunk_synth
scores = cross_val_score(svm_clf, X, Y,cv = 5, scoring = 'f1_weighted')
print(scores)
pred = svm_clf.predict(chunk_test_x)
accuracy = accuracy_score(chunk_test_y,pred)
print(accuracy)
I am using the above-mentioned code. I understand that I am training my classifier within the function of cross_val_score and hence whenever I am trying to call the classifier outside for the prediction on test data, I am getting an error:
sklearn.exceptions.NotFittedError: This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Is there any other option of doing the same thing in the correct way?
Please help me with this issue.
Upvotes: 3
Views: 1712
Reputation: 88226
Indeed model_selection.cross_val_score
uses the input model to fit the data, so it doesn't have to be fitted. However, it does not fit the actual object used as input, rather a copy of it, hence the error This SVC instance is not fitted yet...
when trying to predict.
Looking into the source code in cross_validate
which is called in cross_val_score
, in the scoring step, the estimator
goes through clone
first:
scores = parallel(
delayed(_fit_and_score)(
clone(estimator), X, y, scorers, train, test, verbose, None,
fit_params, return_train_score=return_train_score,
return_times=True, return_estimator=return_estimator,
error_score=error_score)
for train, test in cv.split(X, y, groups))
Which creates a deep copy of the model (which is why the actual input model is not fitted):
def clone(estimator, *, safe=True):
"""Constructs a new estimator with the same parameters.
Clone does a deep copy of the model in an estimator
without actually copying attached data. It yields a new estimator
with the same parameters that has not been fit on any data.
...
Upvotes: 5