Geet
Geet

Reputation: 2575

How can I discretize the numeric variables without losing the original ones?

Here is my toy data with code. How can I discretize the numeric variables without losing the original ones?

library(gapminder); library(tidyverse); library(tidymodels)

gapminder %>% 
  recipe(lifeExp ~ .) %>% 
  step_discretize(all_numeric(), -all_outcomes(), options = list(cuts = 10)) %>% 
  prep() %>% 
  juice()

In the above code I loose the original values from the pop and gdpPercap as they are replaced with their respective discretized verions. How can I keep both: original numeric values as well as discretised variables?

Secondly, instead of bin01, bin02, isn't there a way to get the [0-100], [101-150], etc. kind of bins so I know which bin has what values?

Upvotes: 1

Views: 321

Answers (1)

Geet
Geet

Reputation: 2575

Not sure how to do this using step_discretize with left_joining the data again, but there is a discretize function from arules package that renders bins with values. Here is the what worked for me.

gapminder %>% 
  mutate(across(where(is.numeric),  
      ~arules::discretize(x = .x, method = "interval", breaks = 10), 
         .names = "bin_{col}"))

In case, you know how to do this within recipe, do let me know.

Upvotes: 1

Related Questions