Reputation: 767
Currently I have the following code:
I start by splitting the dataset into training and test sets. I then run GridSearchCV to try and find the optimal parameters. After I have found the optimal parameters I then assess the classifier with the parameter via cross_val_score. Is this an acceptable way to go about this?
Upvotes: 5
Views: 3881
Reputation: 8801
You can specify a scoring
parameter inside the GridSearchCV object like this using make_scorer
from sklearn.metrics import precision_score, make_scorer
prec_metric = make_scorer(precision_score)
grid_search = GridSearchCV(estimator = logreg, scoring= prec_metric param_grid = param_grid, cv = 3, n_jobs=-1, verbose=3)
Once you have fitted your data, you can use results_
attribute to access the scores like this
results = grid_search.results_
{
'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
mask = [False False False False]...)
'param_gamma': masked_array(data = [-- -- 0.1 0.2],
mask = [ True True False False]...),
'param_degree': masked_array(data = [2.0 3.0 -- --],
mask = [False False True True]...),
'split0_test_score' : [0.8, 0.7, 0.8, 0.9],
'split1_test_score' : [0.82, 0.5, 0.7, 0.78],
'mean_test_score' : [0.81, 0.60, 0.75, 0.82],
'std_test_score' : [0.02, 0.01, 0.03, 0.03],
'rank_test_score' : [2, 4, 3, 1],
'split0_train_score' : [0.8, 0.9, 0.7],
'split1_train_score' : [0.82, 0.5, 0.7],
'mean_train_score' : [0.81, 0.7, 0.7],
'std_train_score' : [0.03, 0.03, 0.04],
'mean_fit_time' : [0.73, 0.63, 0.43, 0.49],
'std_fit_time' : [0.01, 0.02, 0.01, 0.01],
'mean_score_time' : [0.007, 0.06, 0.04, 0.04],
'std_score_time' : [0.001, 0.002, 0.003, 0.005],
'params' : [{'kernel': 'poly', 'degree': 2}, ...],
}
You can also use multiple metrics for evaluation as mentioned in this example.
You can make your own custom metric or use one of the metrics specified here.
Update : Based on this answer, you should then feed the classfier from grid_search before fitting on the whole data to cross_val_score, to prevent any data leakage.
Upvotes: 5
Reputation: 189
You actually don't need the cross_val_score
Check out the link I think it will sort things out for you:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html
Upvotes: 1