Reputation: 2575
Here is my toy data with code. How can I discretize the numeric variables without losing the original ones?
library(gapminder); library(tidyverse); library(tidymodels)
gapminder %>%
recipe(lifeExp ~ .) %>%
step_discretize(all_numeric(), -all_outcomes(), options = list(cuts = 10)) %>%
prep() %>%
juice()
In the above code I loose the original values from the pop and gdpPercap as they are replaced with their respective discretized verions. How can I keep both: original numeric values as well as discretised variables?
Secondly, instead of bin01, bin02, isn't there a way to get the [0-100], [101-150], etc. kind of bins so I know which bin has what values?
Upvotes: 1
Views: 321
Reputation: 2575
Not sure how to do this using step_discretize with left_joining the data again, but there is a discretize function from arules package that renders bins with values. Here is the what worked for me.
gapminder %>%
mutate(across(where(is.numeric),
~arules::discretize(x = .x, method = "interval", breaks = 10),
.names = "bin_{col}"))
In case, you know how to do this within recipe, do let me know.
Upvotes: 1