Reputation: 7241
I am trying to evaluate multiple scoring metrics to determine the best parameters for model performance. i.e., to say:
To maximize F1, I should use these parameters. To maximize precision, I should use these parameters.
I am working off the following example from this sklearn page
import numpy as np
from sklearn.datasets import make_hastie_10_2
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
X, y = make_hastie_10_2(n_samples=5000, random_state=42)
scoring = {'PRECISION': 'precision', 'F1': 'f1'}
gs = GridSearchCV(DecisionTreeClassifier(random_state=42),
param_grid={'min_samples_split': range(2, 403, 10)},
scoring=scoring, refit='F1', return_train_score=True)
gs.fit(X, y)
best_params = gs.best_params_
best_estimator = gs.best_estimator_
print(best_params)
print(best_estimator)
Which yields:
{'min_samples_split': 62}
DecisionTreeClassifier(min_samples_split=62, random_state=42)
However, what I would be looking for would be to find these results for each metric, so in this case, for F1 and precision
How can I achieve getting the best parameters for each type of scoring metric in GridSearchCV
?
Note - I believe it has something to do with my usage of refit='F1'
, but am not sure how to use multiple metrics there?
Upvotes: 5
Views: 2654
Reputation: 60321
To do so, you'll have to dig into the detailed results of the whole grid search CV procedure; fortunately, these detailed results are returned in the cv_results_
attribute of the GridSearchCV
object (docs).
I have rerun your code as-is, but I am not retyping it here; it suffices to say that, despite explicitly setting the random number generator's seed, I am getting a different final result (I guess due to different versions) as:
{'min_samples_split': 322}
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=322,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=42, splitter='best')
but this is not important for the issue at hand here.
The easiest way to use the returned cv_results_
dictionary is to convert it to a pandas dataframe:
import pandas as pd
cv_results = pd.DataFrame.from_dict(gs.cv_results_)
Still, as it includes too much info (columns), I will further simplify it here to demonstrate the issue (feel free to explore it more fully yourself):
df = cv_results[['params', 'mean_test_PRECISION', 'rank_test_PRECISION', 'mean_test_F1', 'rank_test_F1']]
pd.set_option("display.max_rows", None, "display.max_columns", None)
pd.set_option('expand_frame_repr', False)
print(df)
Result:
params mean_test_PRECISION rank_test_PRECISION mean_test_F1 rank_test_F1
0 {'min_samples_split': 2} 0.771782 1 0.763041 41
1 {'min_samples_split': 12} 0.768040 2 0.767331 38
2 {'min_samples_split': 22} 0.767196 3 0.776677 29
3 {'min_samples_split': 32} 0.760282 4 0.773634 32
4 {'min_samples_split': 42} 0.754572 8 0.777967 26
5 {'min_samples_split': 52} 0.754034 9 0.777550 27
6 {'min_samples_split': 62} 0.758131 5 0.773348 33
7 {'min_samples_split': 72} 0.756021 6 0.774301 30
8 {'min_samples_split': 82} 0.755612 7 0.768065 37
9 {'min_samples_split': 92} 0.750527 10 0.771023 34
10 {'min_samples_split': 102} 0.741016 11 0.769896 35
11 {'min_samples_split': 112} 0.740965 12 0.765353 39
12 {'min_samples_split': 122} 0.731790 13 0.763620 40
13 {'min_samples_split': 132} 0.723085 14 0.768605 36
14 {'min_samples_split': 142} 0.713345 15 0.774117 31
15 {'min_samples_split': 152} 0.712958 16 0.776721 28
16 {'min_samples_split': 162} 0.709804 17 0.778287 24
17 {'min_samples_split': 172} 0.707080 18 0.778528 22
18 {'min_samples_split': 182} 0.702621 19 0.778516 23
19 {'min_samples_split': 192} 0.697630 20 0.778103 25
20 {'min_samples_split': 202} 0.693011 21 0.781047 10
21 {'min_samples_split': 212} 0.693011 21 0.781047 10
22 {'min_samples_split': 222} 0.693011 21 0.781047 10
23 {'min_samples_split': 232} 0.692810 24 0.779705 13
24 {'min_samples_split': 242} 0.692810 24 0.779705 13
25 {'min_samples_split': 252} 0.692810 24 0.779705 13
26 {'min_samples_split': 262} 0.692810 24 0.779705 13
27 {'min_samples_split': 272} 0.692810 24 0.779705 13
28 {'min_samples_split': 282} 0.692810 24 0.779705 13
29 {'min_samples_split': 292} 0.692810 24 0.779705 13
30 {'min_samples_split': 302} 0.692810 24 0.779705 13
31 {'min_samples_split': 312} 0.692810 24 0.779705 13
32 {'min_samples_split': 322} 0.688417 33 0.782772 1
33 {'min_samples_split': 332} 0.688417 33 0.782772 1
34 {'min_samples_split': 342} 0.688417 33 0.782772 1
35 {'min_samples_split': 352} 0.688417 33 0.782772 1
36 {'min_samples_split': 362} 0.688417 33 0.782772 1
37 {'min_samples_split': 372} 0.688417 33 0.782772 1
38 {'min_samples_split': 382} 0.688417 33 0.782772 1
39 {'min_samples_split': 392} 0.688417 33 0.782772 1
40 {'min_samples_split': 402} 0.688417 33 0.782772 1
The names of the columns should be self-explanatory; they include the parameters tried, the score for each one of the metrics used, and the corresponding rank (1
meaning the best). You can immediately see, for example, that, despite the fact that 'min_samples_split': 322
gives indeed the best F1 score, it is not the only parameter setting that does so, and there are many more settings that also give the best F1 score and a respective rank_test_F1
of 1
in the results.
From this point, it is trivial to get the info you want; for example, here are the best models for each one of your two metrics:
print(df.loc[df['rank_test_PRECISION']==1]) # best precision
# result:
params mean_test_PRECISION rank_test_PRECISION mean_test_F1 rank_test_F1
0 {'min_samples_split': 2} 0.771782 1 0.763041 41
print(df.loc[df['rank_test_F1']==1]) # best F1
# result:
params mean_test_PRECISION rank_test_PRECISION mean_test_F1 rank_test_F1
32 {'min_samples_split': 322} 0.688417 33 0.782772 1
33 {'min_samples_split': 332} 0.688417 33 0.782772 1
34 {'min_samples_split': 342} 0.688417 33 0.782772 1
35 {'min_samples_split': 352} 0.688417 33 0.782772 1
36 {'min_samples_split': 362} 0.688417 33 0.782772 1
37 {'min_samples_split': 372} 0.688417 33 0.782772 1
38 {'min_samples_split': 382} 0.688417 33 0.782772 1
39 {'min_samples_split': 392} 0.688417 33 0.782772 1
40 {'min_samples_split': 402} 0.688417 33 0.782772 1
Upvotes: 7