scikit-learn: Is the cross validation score evaluating the log loss function?

Question

In python sklearn, I'm using stochastic gradient descent to perform multiclass classification, minimizing the log loss function.

clf = SGDClassifier(loss="log", penalty="l2")

When I perform cross validation over my test set, for each split of the data, I compute:

score = clf.fit(X_train, y_train).score(X_test, y_test)

Is the score the evaluation of the loss function?

For each cross validation split, my score is always 0.0. So does that mean my classifier correctly labeled my test data OR does it mean my accuracy is very low?

Ibraim Ganiev · Accepted Answer

Here it is. It isn't related to loss function.

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

It uses accuracy_score function inside.

Accuracy classification score.

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

0.0 means that your classifier can't classify any sample from X_test correctly.

scikit-learn: Is the cross validation score evaluating the log loss function?

Answers (1)

Related Questions