Reputation: 1423
I tried to use GridSearchCV for multi-class case based on the answer from here:
But I got value error, multiclass format is not supported.
How can I use this method for multi-class case?
Following code is from the answer in above link.
import numpy as np
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score, recall_score, f1_score, roc_auc_score, make_scorer
X, y = make_classification(n_samples=3000, n_features=5, weights=[0.1, 0.9, 0.3])
pipe = make_pipeline(StandardScaler(), SVC(kernel='rbf', class_weight='auto'))
param_space = dict(svc__C=np.logspace(-5,0,5), svc__gamma=np.logspace(-2, 2, 10))
accuracy_score, recall_score, roc_auc_score
my_scorer = make_scorer(roc_auc_score, greater_is_better=True)
gscv = GridSearchCV(pipe, param_space, scoring=my_scorer)
gscv.fit(X, y)
print gscv.best_params_
Upvotes: 9
Views: 16873
Reputation: 633
It supports multi-class naturally if the classifier has the correct API by default for y_true
and y_pred
/y_score
.
Otherwise, one has to do some customization using the score function like make_scorer
.
For common metrics like AUROC for multi-classes, sklearn offers the 'roc_auc_ovr'
, where it actually refers to
roc_auc_ovr_scorer = make_scorer(roc_auc_score, needs_proba=True,
multi_class='ovr')
as in the source file.
To deal with multi-class problem with a classifier like e.g.,LogisticRegression
, ovr
is required and y_true
is in the format of categorical values. The above setting will work directly.
Some other metrics for binary classifications can also be extended by wrapping the respective function. E.g., average_precision_score
can be wrapped as
from sklearn.preprocessing import OneHotEncoder
def multi_auprc(y_true_cat, y_score):
y_true = OneHotEncoder().fit_transform(y_true_cat.reshape(-1, 1)).toarray()
return average_precision_score(y_true, y_score)
The metric can then be defined for GridsearchCV as
{
'auprc': make_scorer(multi_auprc, needs_proba=True, greater_is_better=True)
}
Upvotes: 1
Reputation: 89
It supports multi-class
You can set the para of scoring = f1.macro
, example:
gsearch1 = GridSearchCV(estimator = est1, param_grid=params_test1, scoring='f1_macro', cv=5, n_jobs=-1)
Or scoring = 'roc_auc_ovr'
Upvotes: 5
Reputation: 2487
From the documentation on roc_auc_score:
Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format.
By "label indicator format", they mean each label value is represented as a binary column (rather than as a unique target value in a single column). You don't want to do that for your predictor, as it could result in non-mutually-exclusive predictions (i.e., predicting both label 2 and 4 for case p1, or predicting no labels for case p2).
Pick or custom-implement a scoring function that is well-defined for the multiclass problem, such as F1 score. Personally I find informedness more convincing than F1 score, and easier to generalize to the multiclass problem than roc_auc_score.
Upvotes: 8