LuizZ
LuizZ

Reputation: 1044

Mutate across and case_when with multiple output variables for each input variable

I have large a dataframe (below a small sample), and I need to convert all columns that start with the same prefix into multiple columns based on some conditions, keeping the original variables and carrying the original suffixes to the new variables.

Data:

egp <= structure(list(EGP_2007 = structure(c("", "", "II", "", "", "", "", "", "V", "VI"), format.sas = "$"), 
    EGP_2008 = structure(c("", "", "IIIb", "", "", "", "IIIb", "", "V", "VI"), format.sas = "$"), 
    EGP_2009 = structure(c("", "", "IIIb", "", "", "", "I", "II", "V", "I"), format.sas = "$"), 
    EGP_2010 = structure(c("", "", "", "", "", "I", "", "II", "V", "I"), format.sas = "$"), 
    EGP_2011 = structure(c("I", "II", "", "", "", "I", "", "II", "V", "I"), format.sas = "$"), 
    EGP_2012 = structure(c("I", "II", "", "", "I", "VIIb", "I", "II", "I", "I"), format.sas = "$"), 
    EGP_2013 = structure(c("I", "II", "", "", "I", "VIIb", "IIIa", "II", "I", "I"), format.sas = "$"), 
    EGP_2014 = structure(c("I", "II", "", "IIIb", "I", "VIIb", "IIIa", "II", "I", "I"), format.sas = "$"), 
    EGP_2015 = structure(c("I", "IIIa", "", "IIIb", "I", "VIIb", "IIIa", "II", "I", "I"), format.sas = "$"), 
    EGP_2016 = structure(c("I", "IIIa", "", "IIIb", "I", "", "IIIa", "IIIa", "I", "I"), format.sas = "$"), 
    EGP_2017 = structure(c("", "", "", "IIIb", "I", "", "IIIa", "II", "I", "I"), format.sas = "$"), 
    EGP_2018 = structure(c("", "II", "", "IIIb", "I", "", "IIIa", "IIIa", "I", "IIIb"), format.sas = "$")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

What I tried:

I tried to adatp this SO answer to my problem, but I am getting the following error:

Error: Problem with `mutate()` input `..1`. x Can't convert a double vector to function i Input `..1` is `across(...)`.

Here is my code:

egp_2 <- egp %>% 
  mutate(across(contains("EGP"),
                .fns = list(professional = case_when(. %in% c("I", "II") ~ 1,
                                                      . %in% c("IIIa", "IIIb", "V", "VI", "VIIa", "VIIb") ~ 0,
                                                      T ~ NA_real_),
                            routine_non_manual = case_when(. %in% c("IIIa", "IIIb", "V") ~ 1,
                                                      . %in% c("I", "II", "VI", "VIIa", "VIIb") ~ 0,
                                                      T ~ NA_real_),
                            manual = case_when(. %in% c("VI", "VIIa", "VIIb") ~ 1,
                                                      . %in% c("I", "II", "IIIa", "IIIb", "V") ~ 0,
                                                      T ~ NA_real_)),
                 .names = "{fn}_{col}" ))

Any solutions are appreciated. The original variables contain an occupational classification and I want to convert it into subtypes dummies for plots and regression.

Upvotes: 3

Views: 701

Answers (1)

akrun
akrun

Reputation: 886928

We need the anonymous function

 egp %>% 
     mutate(across(contains("EGP"),
            .fns = list(professional = ~ case_when(. %in% c("I", "II") ~ 1,
                                                  . %in% c("IIIa", "IIIb", "V", "VI", "VIIa", "VIIb") ~ 0,
                                                  T ~ NA_real_),
                        routine_non_manual =~ case_when(. %in% c("IIIa", "IIIb", "V") ~ 1,
                                                  . %in% c("I", "II", "VI", "VIIa", "VIIb") ~ 0,
                                                  T ~ NA_real_),
                        manual = ~ case_when(. %in% c("VI", "VIIa", "VIIb") ~ 1,
                                                  . %in% c("I", "II", "IIIa", "IIIb", "V") ~ 0,
                                                  T ~ NA_real_)),
             .names = "{fn}_{col}" ))

Upvotes: 6

Related Questions