Reputation: 771
I want to optimize the hyperparameters of an SVM by GridSearchCV. But the score of the best estimator is very different from the score when run the svm with the best parameters.
#### Hyperparameter search with GridSearchCV###
pipeline = Pipeline([
("scaler", StandardScaler()),
("svm", LinearSVC(loss='hinge'))])
param_grid=[{'svm__C': c_range}]
clf = GridSearchCV(pipeline, param_grid=param_grid, cv=5, scoring='accuracy')
clf.fit(X,y)
print('\n Best score: ',clf.best_score_)
#### scale train and test data ###
sc = StandardScaler()
sc.fit(X)
X = scaler.transform(X)
X_test = sc.transform(X_test)
###### test best estimator with test data ###################
print("Best estimator score: ", clf.best_estimator_.score(X_test, y_test))
##### run SVM with the best found parameter #####
svc = LinearSVC(C=clf.best_params_['svm_C'])
svc.fit(X,y)
print("score with best parameter: ", svc.score(X_test,y_test))
The results are as follows:
Best score: 0.784
Best estimator score: 0.6991
score with best parameter: 0.7968
I don't understand why the scores of the best estimator and the svm are different? Which of these result is the correct test accuracy? Why is the score of the Best estimator with 0.6991 so worse? Have I done something wrong?
Upvotes: 0
Views: 731
Reputation: 4264
In the line below:
print("Best estimator score: ", clf.best_estimator_.score(X_test, y_test))
you are passing X_test
which is already scaled to clf
which is a pipeline
which contains another scaler, so essentially you are scaling your data twice as apposed to your last predict statement where you pass your scaled data to svc
which just does the model fitting without scaling. So the data fed in both the cases are quite different and so your predictions are also different.
Hope this helps!
Upvotes: 1