Salvador Dali
Salvador Dali

Reputation: 222531

Pass a scoring function from sklearn.metrics to GridSearchCV

GridSearchCV's documentations states that I can pass a scoring function.

scoring : string, callable or None, default=None

I would like to use a native accuracy_score as a scoring function.

So here is my attempt. Imports and some data:

import numpy as np
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn import neighbors

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([0, 1, 0, 0, 0, 1])

Now when I use just k-fold cross-validation without my scoring function, everything works as intended:

parameters = {
    'n_neighbors': [2, 3, 4],
    'weights':['uniform', 'distance'],
    'p': [1, 2, 3]
}
model = neighbors.KNeighborsClassifier()
k_fold = KFold(len(Y), n_folds=6, shuffle=True, random_state=0)
clf = GridSearchCV(model, parameters, cv=k_fold)  # TODO will change
clf.fit(X, Y)

print clf.best_score_

But when I change the line to

clf = GridSearchCV(model, parameters, cv=k_fold, scoring=accuracy_score) # or accuracy_score()

I get the error: ValueError: Cannot have number of folds n_folds=10 greater than the number of samples: 6. which in my opinion does not represent the real problem.

In my opinion the problem is that accuracy_score does not follow the signature scorer(estimator, X, y), which is written in the documentation


So how can I fix this problem?

Upvotes: 3

Views: 8257

Answers (2)

maxymoo
maxymoo

Reputation: 36545

It will work if you change scoring=accuracy_score to scoring='accuracy' (see the documentation for the full list of scorers you can use by name in this way.)

In theory, you should be able to pass custom scoring functions like you're trying, but my guess is that you're right and accuracy_score doesn't have the right API.

Upvotes: 7

Bharat Ram Ammu
Bharat Ram Ammu

Reputation: 184

Here is an example of using Weighted Kappa as scoring metric for GridSearchCV for a simple Random Forest model. The key learning for me was to use the parameters related to the scorer in the 'make_scorer' function.

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import cohen_kappa_score, make_scorer


kappa_scorer = make_scorer(cohen_kappa_score,weights="quadratic")
# Create the parameter grid based on the results of random search 
param_grid = {
    'bootstrap': [True],
    'max_features':  range(2,10), # try features from 2 to 10
    'min_samples_leaf': [3, 4, 5],
    'n_estimators' : [100,300,500],
    'max_depth':  [5]
    }
# Create a based model
random_forest = RandomForestClassifier(class_weight ="balanced_subsample",random_state=1)
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = random_forest, param_grid = param_grid, 
                         cv = 5, n_jobs = -1, verbose = 2, scoring = kappa_scorer) # search for best model using roc_auc

# Fit the grid search to the data
grid_search.fit(final_tr, yTrain)

Upvotes: 1

Related Questions