Reputation: 23
I have a pretty simply problem where my outcome is binary and I am trying to use logistic regression (using tidymodels) to classify based on a few predictors (some of which are well-known as good predictors).
I coded the factor outcome as 0 and 1 (1=positive and that what I am mostly interested in).
When I run the predict function with both types="class" and types="prob" I get columns named: .pred_class, .pred_0, and .pred_1.
Then when, for example, plotting the ROC curve I am wondering whether I should use
roc1 <- roc_curve(data_test_pred, outcome, .pred_1)
or
roc1 <- roc_curve(data_test_pred, outcome, .pred_0).
The first (which I would have thought was correct) gives a bad ROC curve below the diagonal and the second gives a decent ROC curve.
So, I am just not understanding what is going on here and I'm not sure how to proceed.
Upvotes: -1
Views: 193
Reputation: 226
yardstick uses the first level as the event. So if your outcome
is a factor with levels c(0,1)
, then yardstick takes the first level, 0
, as the event level. This then matches up with you getting a reasonable curve when supplying .pred_0
as the column with the class probabilities for the event.
If you want to use the second factor level as the event level, you can set event_level = "second"
in roc_curv()
, see also https://yardstick.tidymodels.org/reference/roc_auc.html#relevant-level.
Upvotes: 1