Moritz Schwarz
Moritz Schwarz

Reputation: 2489

tidymodels recipe: Using all_of to select variables stored in a vector

I would like to use a vector with column names for a variety of step functions in the tidymodels recipe package. My intuition was simply to use (the prep and juice just used here for illustration):

library(tidymodels)
library(modeldata)
data(biomass)

remove_vector <- c("oxygen","nitrogen")

test_recipe <- recipe(HHV ~ .,data = biomass) %>%
  step_rm(remove_vector)

test_recipe %>% 
  prep %>% 
  juice %>% 
  head

But this returns the warning:

Note: Using an external vector in selections is ambiguous.
i Use `all_of(remove_vector)` instead of `remove_vector` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.

This, of course, concerns me (I want to make sure I code without coming across error messages), but I still get the outcome I desire.

However, when I follow the error message and use the following with all_of:

test_recipe <- recipe(HHV ~ .,data = biomass) %>%
  step_rm(all_of(remove_vector))

test_recipe %>% 
  prep %>% 
  juice %>% 
  head

I get the error message:

Error: Not all functions are allowed in step function selectors (e.g. all_of). See ?selections.

In the ?selections, I don't seem to find reference to the exact (seemingly simple) problem that I have.

Any ideas? Many thanks!

Upvotes: 0

Views: 1249

Answers (1)

mihagazvoda
mihagazvoda

Reputation: 1367

If you use quasiquotation you won't get a warning:

library(tidymodels)
library(modeldata)
data(biomass)

remove_vector <- c("oxygen", "nitrogen")

test_recipe <- recipe(HHV ~ .,data = biomass) %>%
  step_rm(!!!syms(remove_vector))

test_recipe %>% 
  prep %>% 
  juice %>% 
  head

More on the warning. It can happen that you name vector the same as one of your column names. For example:

oxygen <- c("oxygen","nitrogen")

test_recipe <- recipe(HHV ~ .,data = biomass) %>%
  step_rm(oxygen)

This will remove only oxygen column. However, if you use !!!syms(oxygen), both columns will be removed.

Upvotes: 3

Related Questions