inprogress123
inprogress123

Reputation: 23

I'm not understanding .pred_class in classification (using logistic regression)

I have a pretty simply problem where my outcome is binary and I am trying to use logistic regression (using tidymodels) to classify based on a few predictors (some of which are well-known as good predictors).

I coded the factor outcome as 0 and 1 (1=positive and that what I am mostly interested in).

When I run the predict function with both types="class" and types="prob" I get columns named: .pred_class, .pred_0, and .pred_1.

Then when, for example, plotting the ROC curve I am wondering whether I should use

roc1 <- roc_curve(data_test_pred, outcome, .pred_1)

or

roc1 <- roc_curve(data_test_pred, outcome, .pred_0).

The first (which I would have thought was correct) gives a bad ROC curve below the diagonal and the second gives a decent ROC curve.

So, I am just not understanding what is going on here and I'm not sure how to proceed.

Upvotes: -1

Views: 193

Answers (1)

hannahfrick
hannahfrick

Reputation: 226

yardstick uses the first level as the event. So if your outcome is a factor with levels c(0,1), then yardstick takes the first level, 0, as the event level. This then matches up with you getting a reasonable curve when supplying .pred_0 as the column with the class probabilities for the event.

If you want to use the second factor level as the event level, you can set event_level = "second" in roc_curv(), see also https://yardstick.tidymodels.org/reference/roc_auc.html#relevant-level.

Upvotes: 1

Related Questions