Bilal Khan
Bilal Khan

Reputation: 1

tidymodels step_corr() fails to remove highly correlated columns?

I can't seem to get step_corr() to function inside a recipe.

Minimal example:

df <- data.frame(x1=runif(10)) %>% 
  mutate(x2=x1+1) %>% 
  mutate(y=x1+rnorm(10))

cor(df)

rec <- recipe(y~x1+x2, data = df) %>%
  step_corr(threshold=0.9) %>%
  prep(df)

bake(rec, new_data=df)

What am I doing wrong or misunderstanding? Thank you.

Upvotes: 0

Views: 40

Answers (1)

EmilHvitfeldt
EmilHvitfeldt

Reputation: 3185

You forgot to selector variables in step_corr(). All steps allow for empty selections which does nothing

library(recipes)

df <- data.frame(x1=runif(10)) %>% 
  mutate(x2=x1+1) %>% 
  mutate(y=x1+rnorm(10))

cor(df)
#>           x1        x2         y
#> x1 1.0000000 1.0000000 0.6882089
#> x2 1.0000000 1.0000000 0.6882089
#> y  0.6882089 0.6882089 1.0000000

rec <- recipe(y~x1+x2, data = df) %>%
  step_corr(all_predictors(), threshold=0.9) %>%
  prep(df)

bake(rec, new_data=df)
#> # A tibble: 10 × 2
#>       x2      y
#>    <dbl>  <dbl>
#>  1  1.06 -0.353
#>  2  1.53 -0.951
#>  3  1.87  2.51 
#>  4  1.43 -0.288
#>  5  1.60  0.696
#>  6  1.64  0.296
#>  7  1.31  1.16 
#>  8  1.07 -1.37 
#>  9  1.49 -0.215
#> 10  1.70  1.16

Created on 2024-08-05 with reprex v2.1.0

Upvotes: 1

Related Questions