叶全伟
叶全伟

Reputation: 21

How do I set which level is the "event" in my outcome variable using tidymodels?

I am using for machine learning and I want to predict a binary response/outcome. How do I specify which level of the outcome is the "event" or positive case?

Does this happen in the recipe, or somewhere else?


##split the data
anxiety_split <- initial_split(anxiety_df, strata = anxiety)


anxiety_train <- training(anxiety_split)
anxiety_test <- testing(anxiety_split)


set.seed(1222) 
anxiety_cv <- vfold_cv(anxiety_train, strata = anxiety)

anxiety_rec <- recipe(anxiety ~ ., data = anxiety_train, positive = 'pos') %>%
  step_corr(all_numeric()) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  step_zv(all_numeric()) %>%
  step_normalize(all_numeric())

Upvotes: 2

Views: 1422

Answers (1)

Julia Silge
Julia Silge

Reputation: 11623

You don't need to set which level of your outcome variable is the "event" until it is time to evaluate your model. You can do this using the event_level argument of most yardstick functions. For example, check out how to do this for yardstick::roc_curve():

library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(tidyverse)

data(two_class_example)


## looks good!
two_class_example %>%
  roc_curve(truth, Class1, event_level = "first") %>%
  autoplot()



## YIKES!! we got this backwards
two_class_example %>%
  roc_curve(truth, Class1, event_level = "second") %>%
  autoplot()

Created on 2020-08-02 by the reprex package (v0.3.0.9001)

Notice the message on startup for yardstick; the first factor level is assumed to be the event. This is similar to how base R acts. You only need to worry about event_level if your "event" is not the first factor level.

Upvotes: 8

Related Questions