Reputation: 63599
After performing a grid search with sklearn.grid_search.GridSearchCV()
on a linear_model.Ridge
to find a suitable alpha
, we can get the grid scores using clf.grid_scores_
.
What do the numbers in the results mean? How do these numbers tell us which was the best alhpa
? Here's an example of a grid_scores_
result:
[({'alpha': 10.0},
-3.5395266121766391e-06,
array([ -5.81901982e-06, -5.27253774e-08, -4.74683464e-06])),
({'alpha': 5.0},
-3.5395266121766391e-06,
array([ -5.81901982e-06, -5.27253774e-08, -4.74683464e-06])),
({'alpha': 1.0},
-3.5395266121766391e-06,
array([ -5.81901982e-06, -5.27253774e-08, -4.74683464e-06])),
({'alpha': 0.5},
-3.5395266121766391e-06,
array([ -5.81901982e-06, -5.27253774e-08, -4.74683464e-06])),
({'alpha': 0.1},
-3.5395266121766391e-06,
array([ -5.81901982e-06, -5.27253774e-08, -4.74683464e-06])),
({'alpha': 0.05},
-3.5395266121766391e-06,
array([ -5.81901982e-06, -5.27253774e-08, -4.74683464e-06])),
({'alpha': 0.01},
0.00019276539505293697,
array([ 5.83095745e-04, -5.27253774e-08, -4.74683464e-06])),
({'alpha': 0.005},
0.072428630958501342,
array([ 0.07335483, 0.07190767, 0.07202339])),
({'alpha': 0.001},
0.37063142154124262,
array([ 0.37106198, 0.36953822, 0.37129406])),
({'alpha': 0.0005},
0.47042710942522803,
array([ 0.47063049, 0.4686987 , 0.47195214])),
({'alpha': 0.0001},
0.61100922361083054,
array([ 0.61189728, 0.60846248, 0.61266791]))]
Upvotes: 2
Views: 4250
Reputation: 1967
In general, it is a list of scores for each set of parameters.
Each element of the list is a triple <parameter dict, average score, list of scores over all folds>
. The first element in the triple is dictionary of parameters used for the particular run, in your case there is only one parameter, the alpha
. The second element in the triple is the average score over all the folds, i.e. over the list that is the third element in the triple. If you didn't specify your own score function, the default for Ridge regression is the coefficient of determination R^2. The last item in the triple is the array of scores over all folds (over which the average was computed). The number of folds is specified by the cv parameter (default is 3).
You typically want to find the triple which has the maximal average score. In your case, the maximum is at alpha 0.0001:
({'alpha': 0.0001},
0.61100922361083054,
array([ 0.61189728, 0.60846248, 0.61266791]))
Upvotes: 3