Johnny
Johnny

Reputation: 869

I am not clear on the meaning of the best_score_ from GridSearchCV

I ran an experiment with several models and generated a best score for each of them to help me decide on the one to choose for the final model. The best score results have been generated with the following code:

print(f'Ridge score is {np.sqrt(ridge_grid_search.best_score_ * -1)}')
print(f'Lasso score is {np.sqrt(lasso_grid_search.best_score_ * -1)}')
print(f'ElasticNet score is {np.sqrt(ElasticNet_grid_search.best_score_ * -1)}')
print(f'KRR score is {np.sqrt(KRR_grid_search.best_score_ * -1)}')
print(f'GradientBoosting score is {np.sqrt(gradientBoost_grid_search.best_score_ * -1)}')
print(f'XGBoosting score is {np.sqrt(XGB_grid_search.best_score_ * -1)}')
print(f'LGBoosting score is {np.sqrt(LGB_grid_search.best_score_ * -1)}')

The results are posted here:

Ridge score is 0.11353489315048314
Lasso score is 0.11118171778462431
ElasticNet score is 0.11122236468840378
KRR score is 0.11322596291030147
GradientBoosting score is 0.11111049287476948
XGBoosting score is 0.11404604560959673
LGBoosting score is 0.11299104859531962

I am not sure how to choose the best model. Is XGBoosting my best model in this case?

Upvotes: 0

Views: 403

Answers (1)

kaanaytekin
kaanaytekin

Reputation: 26

Your code is not provided however from the name of ridge_grid_search I suppose you are using sklearn.model_selection.GridSearchCV for model selection. GridSearch should be used to tune the hyperparameters of a single model and should not be used to compare different models with each other. ridge_grid_search.best_score_ returns the best score achieved by the best hyperparameters found during the grid search og the given algorithm.

For model comparison you should use a cross validation algorithm such as k-fold cross validation While using cross validation make sure every model is trained and tested on the same training/testing sets for fair comparison.

Upvotes: 1

Related Questions