Lulliter
Lulliter

Reputation: 107

function argument inside dplyr::across for transformation on multiple columns

I want to recode some [0,1] variables into factors with levels ["no","yes"] using dplyr::across. I made it work, but I want to understand how to define the function argument inside across in both syntax options.

library(dplyr)
library(forcats)

# toy dataset
df = data.frame(
  var_a = c(0,1,0,1,0,1,0,1,0,1),
  var_b = c(0,1,0,1,0,1,0,1,0,1),
  number_a = 1:10,
  number_b = 21:30)

# selection of columns 
some_cols = c("var_a", "var_b" )


# ---- 1) class trarnsfomation
# this works 
df2_a <- df %>% 
  mutate(across(.cols = some_cols, .fns = as.factor, .names = "{.col}_f")) 
# this works too 
df2_b <- df %>% 
  mutate(across(some_cols, ~ as.factor(.x), .names = "{.col}_f")) 


# ---- 2) Change factor levels 
# this DOES NOT work !!!
df3_a <- df2_a %>% 
  mutate(across(.cols = ends_with("_f"), 
                .fns = fct_recode, c(yes = "1", no = "0" ))) 
# this works  
df3_b <- df2_b %>% 
  mutate(across(ends_with("_f"), ~ fct_recode(.x , yes = "1", no = "0" )))

What am I doing wrong in df3_a?

Upvotes: 0

Views: 101

Answers (1)

stefan
stefan

Reputation: 125398

The issue is that each element passed to fct_recode has to be a named string not a named vector, i.e. what you are doing is

~ fct_recode(.x, c(yes = "1", no = "0" ))

whereas it should be

~ fct_recode(.x, yes = "1", no = "0" )

Hence, to make your code work do

library(dplyr, warn=FALSE)
library(forcats)

df %>%
  mutate(
    across(
      all_of(some_cols),
      .fns = as.factor, .names = "{.col}_f"
    ),
    across(
      ends_with("_f"),
      .fns = fct_recode, yes = "1", no = "0"
    )
  )
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(ends_with("_f"), .fns = fct_recode, yes = "1", no =
#>   "0")`.
#> Caused by warning:
#> ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
#> Supply arguments directly to `.fns` through an anonymous function instead.
#> 
#>   # Previously
#>   across(a:b, mean, na.rm = TRUE)
#> 
#>   # Now
#>   across(a:b, \(x) mean(x, na.rm = TRUE))
#>    var_a var_b number_a number_b var_a_f var_b_f
#> 1      0     0        1       21      no      no
#> 2      1     1        2       22     yes     yes
#> 3      0     0        3       23      no      no
#> 4      1     1        4       24     yes     yes
#> 5      0     0        5       25      no      no
#> 6      1     1        6       26     yes     yes
#> 7      0     0        7       27      no      no
#> 8      1     1        8       28     yes     yes
#> 9      0     0        9       29      no      no
#> 10     1     1       10       30     yes     yes

However, as the warnings tell us, the ... argument of across was deprecated in dplyr 1.1.0. Instead one should now use an anonymous function as you did in df3_b.

Upvotes: 1

Related Questions