Reputation: 45741
In the code below, I am trying to understand the connection between best_estimator_
and best_score_
. I think that I should be able to get (at least a very close approximation) to best_score_
by scoring the results of best_estimator_
like so:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
classifier = GridSearchCV(LogisticRegression(penalty='l1'),
{'C':10**(np.linspace(1,6,num=11))},
scoring='neg_log_loss')
classifier.fit(X_train, y_train)
y_pred = classifier.best_estimator_.predict(X_train)
print(f'{log_loss(y_train,y_pred)}')
print(f'{classifier.best_score_}')
However I get the following outputs (the numbers do not vary much on different runs):
7.841241697018637
-0.5470694752031108
I understand that best_score_
will be calculated as an average of the cross-validation iterations, however this should surely be a close approximation (an unbiased estimator even?) of calculating the metric on the whole set. I don't understand why they are so very different so I assume that I've made an implementation error.
How can I calculate classifier.best_score_
myself?
Upvotes: 0
Views: 3295
Reputation: 36599
Log_loss is mostly defined for predict_proba()
.
I am assuming that GridSearchCV is internally calling predict_proba and then calculating the score.
Please change the predict()
to predict_proba()
and you will see similar results.
y_pred = classifier.best_estimator_.predict_proba(X)
print(log_loss(y_train,y_pred))
print(classifier.best_score_)
On iris dataset, I am getting the following output:
0.165794760809
-0.185370083771
which looks quite close.
Update:
Looks like this is the case: When you supply 'loss_loss'
as a string to GridSearchCV, this is how its initialized as a scorer to be passed on to _fit_and_score()
method of GridSearchCV():
log_loss_scorer = make_scorer(log_loss, greater_is_better=False,
needs_proba=True)
As you can see, the needs_proba
is true, means that for scoring predict_proba() will be used.
Upvotes: 1