Reputation: 14652
I was able to use following method to do cross validation on binary data, but it seems not working for multiclass data:
> cross_validation.cross_val_score(alg, X, y, cv=cv_folds, scoring='roc_auc')
/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/metrics/scorer.py in __call__(self, clf, X, y, sample_weight)
169 y_type = type_of_target(y)
170 if y_type not in ("binary", "multilabel-indicator"):
--> 171 raise ValueError("{0} format is not supported".format(y_type))
172
173 if is_regressor(clf):
ValueError: multiclass format is not supported
> y.head()
0 10
1 6
2 12
3 6
4 10
Name: rank, dtype: int64
> type(y)
pandas.core.series.Series
I also tried changing roc_auc
to f1
but still having error:
/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight)
1016 else:
1017 raise ValueError("Target is %s but average='binary'. Please "
-> 1018 "choose another average setting." % y_type)
1019 elif pos_label not in (None, 1):
1020 warnings.warn("Note that pos_label (set to %r) is ignored when "
ValueError: Target is multiclass but average='binary'. Please choose another average setting.
Is there any method I can use to do cross validation for such type of data?
Upvotes: 3
Views: 11859
Reputation: 5587
As pointed out in the comment by Vivek Kumar sklearn metrics support multi-class averaging for both the F1 score and the ROC computations, albeit with some limitations when data is unbalanced. So you can manually construct the scorer with the corresponding average
parameter or use one of the predefined ones (e.g.: 'f1_micro', 'f1_macro', 'f1_weighted').
If multiple scores are needed, then instead of cross_val_score
use cross_validate
(available since sklearn 0.19 in the module sklearn.model_selection
).
Upvotes: 2