Borys
Borys

Reputation: 1423

Does GridSearchCV not support multi-class?

I tried to use GridSearchCV for multi-class case based on the answer from here:

Accelerating the prediction

But I got value error, multiclass format is not supported.

How can I use this method for multi-class case?

Following code is from the answer in above link.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score, recall_score, f1_score, roc_auc_score, make_scorer

X, y = make_classification(n_samples=3000, n_features=5, weights=[0.1, 0.9, 0.3])

pipe = make_pipeline(StandardScaler(), SVC(kernel='rbf', class_weight='auto'))

param_space = dict(svc__C=np.logspace(-5,0,5), svc__gamma=np.logspace(-2, 2, 10))

accuracy_score, recall_score, roc_auc_score
my_scorer = make_scorer(roc_auc_score, greater_is_better=True)

gscv = GridSearchCV(pipe, param_space, scoring=my_scorer)
gscv.fit(X, y)

print gscv.best_params_

Upvotes: 9

Views: 16873

Answers (3)

Moore
Moore

Reputation: 633

It supports multi-class naturally if the classifier has the correct API by default for y_true and y_pred/y_score.

Otherwise, one has to do some customization using the score function like make_scorer.

For common metrics like AUROC for multi-classes, sklearn offers the 'roc_auc_ovr', where it actually refers to

roc_auc_ovr_scorer = make_scorer(roc_auc_score, needs_proba=True,
                                 multi_class='ovr')

as in the source file.

To deal with multi-class problem with a classifier like e.g.,LogisticRegression, ovr is required and y_true is in the format of categorical values. The above setting will work directly.

Some other metrics for binary classifications can also be extended by wrapping the respective function. E.g., average_precision_score can be wrapped as

from sklearn.preprocessing import OneHotEncoder


def multi_auprc(y_true_cat, y_score):
    y_true = OneHotEncoder().fit_transform(y_true_cat.reshape(-1, 1)).toarray()
    
    return average_precision_score(y_true, y_score)

The metric can then be defined for GridsearchCV as

{
'auprc': make_scorer(multi_auprc, needs_proba=True, greater_is_better=True)    
}

Upvotes: 1

zhe zheng
zhe zheng

Reputation: 89

It supports multi-class You can set the para of scoring = f1.macro, example:

gsearch1 = GridSearchCV(estimator = est1, param_grid=params_test1, scoring='f1_macro', cv=5, n_jobs=-1)

Or scoring = 'roc_auc_ovr'

Upvotes: 5

Andreus
Andreus

Reputation: 2487

From the documentation on roc_auc_score:

Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format.

By "label indicator format", they mean each label value is represented as a binary column (rather than as a unique target value in a single column). You don't want to do that for your predictor, as it could result in non-mutually-exclusive predictions (i.e., predicting both label 2 and 4 for case p1, or predicting no labels for case p2).

Pick or custom-implement a scoring function that is well-defined for the multiclass problem, such as F1 score. Personally I find informedness more convincing than F1 score, and easier to generalize to the multiclass problem than roc_auc_score.

Upvotes: 8

Related Questions