Reputation: 454
I have a question about the difference between a randomsearch.score method (score 1) and randomsearch.best_score_ attribute (score 2).
Particularly when randomsearch.score is applied to X_train and y_train.
I thought the randomsearchCV automatically looks for the params that give the highest score on the training set? I would have assumed the randomsearch.score(Xtrain, ytrain) would be the same as the randomsearch.best_params_ score?
from sklearn.model_selection import RandomizedSearchCV
def evaluate_model(model, param_grid, n_iter=100):
random_search = RandomizedSearchCV(model,
param_grid,
cv=5,
n_jobs=2,
verbose=1,
n_iter=n_iter)
random_search.fit(X_train, y_train)
print (random_search.score(X_train, y_train)) # Score 1
print (random_search.best_score_) # Score 2
print (random_search.score(X_test, y_test)) # Score 3
return random_search
rgr = GradientBoostingRegressor(n_estimators=50)
param_grid = {"max_depth": range(1,10,1)}
gradient_boosting = evaluate_model(rgr, param_grid)
instead returns
Score 1: 0.9585014239352219
Score 2: 0.7129331788310186
Score 3: 0.7530744077231204
Upvotes: 2
Views: 370
Reputation: 5455
With random_search.score(X_train, y_train)
, you are testing on same data you've used for training, hence such a high score. This is a (almost) completely meaningless information**, as it doesn't tell you how well your model will perform in unseen data.
cv=5
means your data was partitioned 5 times for each hyper-parameter setting, with 20% data used for testing, and 80% used for training in each partition. Result of these 5 test set is then averaged. The highest such average among all possible hyper-parameter setting is then recorded in random_search.best_score_
. So the crucial difference is you aren't evaluating performance on same data used for training, hence comparably lower score.
random_search.score(X_test, y_test)
is same as best_score_
in that you are evaluating model on unseen data, but it is better indicator of actual generalization performance. Unlike score 2
however, your model has been trained with 100% training data (as opposed to 80%). This is one possible explanation as to why score 3
is better than score 2
.
**If this value is low, you know your model is underfitting, and should try increasing model complexity, like adding more hidden layers to NN, or increasing max_depth
of decision tree.
Upvotes: 3