Reputation: 222531
GridSearchCV's documentations states that I can pass a scoring function.
scoring : string, callable or None, default=None
I would like to use a native accuracy_score as a scoring function.
So here is my attempt. Imports and some data:
import numpy as np
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn import neighbors
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([0, 1, 0, 0, 0, 1])
Now when I use just k-fold cross-validation without my scoring function, everything works as intended:
parameters = {
'n_neighbors': [2, 3, 4],
'weights':['uniform', 'distance'],
'p': [1, 2, 3]
}
model = neighbors.KNeighborsClassifier()
k_fold = KFold(len(Y), n_folds=6, shuffle=True, random_state=0)
clf = GridSearchCV(model, parameters, cv=k_fold) # TODO will change
clf.fit(X, Y)
print clf.best_score_
But when I change the line to
clf = GridSearchCV(model, parameters, cv=k_fold, scoring=accuracy_score) # or accuracy_score()
I get the error: ValueError: Cannot have number of folds n_folds=10 greater than the number of samples: 6.
which in my opinion does not represent the real problem.
In my opinion the problem is that accuracy_score
does not follow the signature scorer(estimator, X, y)
, which is written in the documentation
So how can I fix this problem?
Upvotes: 3
Views: 8257
Reputation: 36545
It will work if you change scoring=accuracy_score
to scoring='accuracy'
(see the documentation for the full list of scorers you can use by name in this way.)
In theory, you should be able to pass custom scoring functions like you're trying, but my guess is that you're right and accuracy_score
doesn't have the right API.
Upvotes: 7
Reputation: 184
Here is an example of using Weighted Kappa as scoring metric for GridSearchCV for a simple Random Forest model. The key learning for me was to use the parameters related to the scorer in the 'make_scorer' function.
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import cohen_kappa_score, make_scorer
kappa_scorer = make_scorer(cohen_kappa_score,weights="quadratic")
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_features': range(2,10), # try features from 2 to 10
'min_samples_leaf': [3, 4, 5],
'n_estimators' : [100,300,500],
'max_depth': [5]
}
# Create a based model
random_forest = RandomForestClassifier(class_weight ="balanced_subsample",random_state=1)
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = random_forest, param_grid = param_grid,
cv = 5, n_jobs = -1, verbose = 2, scoring = kappa_scorer) # search for best model using roc_auc
# Fit the grid search to the data
grid_search.fit(final_tr, yTrain)
Upvotes: 1