Deqing
Deqing

Reputation: 14652

How to do cross validation for multiclass data?

I was able to use following method to do cross validation on binary data, but it seems not working for multiclass data:

> cross_validation.cross_val_score(alg, X, y, cv=cv_folds, scoring='roc_auc')

/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/metrics/scorer.py in __call__(self, clf, X, y, sample_weight)
    169         y_type = type_of_target(y)
    170         if y_type not in ("binary", "multilabel-indicator"):
--> 171             raise ValueError("{0} format is not supported".format(y_type))
    172 
    173         if is_regressor(clf):

ValueError: multiclass format is not supported

> y.head()

0    10
1     6
2    12
3     6
4    10
Name: rank, dtype: int64

> type(y)

pandas.core.series.Series

I also tried changing roc_auc to f1 but still having error:

/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight)
   1016         else:
   1017             raise ValueError("Target is %s but average='binary'. Please "
-> 1018                              "choose another average setting." % y_type)
   1019     elif pos_label not in (None, 1):
   1020         warnings.warn("Note that pos_label (set to %r) is ignored when "

ValueError: Target is multiclass but average='binary'. Please choose another average setting.

Is there any method I can use to do cross validation for such type of data?

Upvotes: 3

Views: 11859

Answers (1)

rll
rll

Reputation: 5587

As pointed out in the comment by Vivek Kumar sklearn metrics support multi-class averaging for both the F1 score and the ROC computations, albeit with some limitations when data is unbalanced. So you can manually construct the scorer with the corresponding average parameter or use one of the predefined ones (e.g.: 'f1_micro', 'f1_macro', 'f1_weighted').

If multiple scores are needed, then instead of cross_val_score use cross_validate (available since sklearn 0.19 in the module sklearn.model_selection).

Upvotes: 2

Related Questions