Reputation: 1044
I have large a dataframe (below a small sample), and I need to convert all columns that start with the same prefix into multiple columns based on some conditions, keeping the original variables and carrying the original suffixes to the new variables.
Data:
egp <= structure(list(EGP_2007 = structure(c("", "", "II", "", "", "", "", "", "V", "VI"), format.sas = "$"),
EGP_2008 = structure(c("", "", "IIIb", "", "", "", "IIIb", "", "V", "VI"), format.sas = "$"),
EGP_2009 = structure(c("", "", "IIIb", "", "", "", "I", "II", "V", "I"), format.sas = "$"),
EGP_2010 = structure(c("", "", "", "", "", "I", "", "II", "V", "I"), format.sas = "$"),
EGP_2011 = structure(c("I", "II", "", "", "", "I", "", "II", "V", "I"), format.sas = "$"),
EGP_2012 = structure(c("I", "II", "", "", "I", "VIIb", "I", "II", "I", "I"), format.sas = "$"),
EGP_2013 = structure(c("I", "II", "", "", "I", "VIIb", "IIIa", "II", "I", "I"), format.sas = "$"),
EGP_2014 = structure(c("I", "II", "", "IIIb", "I", "VIIb", "IIIa", "II", "I", "I"), format.sas = "$"),
EGP_2015 = structure(c("I", "IIIa", "", "IIIb", "I", "VIIb", "IIIa", "II", "I", "I"), format.sas = "$"),
EGP_2016 = structure(c("I", "IIIa", "", "IIIb", "I", "", "IIIa", "IIIa", "I", "I"), format.sas = "$"),
EGP_2017 = structure(c("", "", "", "IIIb", "I", "", "IIIa", "II", "I", "I"), format.sas = "$"),
EGP_2018 = structure(c("", "II", "", "IIIb", "I", "", "IIIa", "IIIa", "I", "IIIb"), format.sas = "$")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
What I tried:
I tried to adatp this SO answer to my problem, but I am getting the following error:
Error: Problem with `mutate()` input `..1`. x Can't convert a double vector to function i Input `..1` is `across(...)`.
Here is my code:
egp_2 <- egp %>%
mutate(across(contains("EGP"),
.fns = list(professional = case_when(. %in% c("I", "II") ~ 1,
. %in% c("IIIa", "IIIb", "V", "VI", "VIIa", "VIIb") ~ 0,
T ~ NA_real_),
routine_non_manual = case_when(. %in% c("IIIa", "IIIb", "V") ~ 1,
. %in% c("I", "II", "VI", "VIIa", "VIIb") ~ 0,
T ~ NA_real_),
manual = case_when(. %in% c("VI", "VIIa", "VIIb") ~ 1,
. %in% c("I", "II", "IIIa", "IIIb", "V") ~ 0,
T ~ NA_real_)),
.names = "{fn}_{col}" ))
Any solutions are appreciated. The original variables contain an occupational classification and I want to convert it into subtypes dummies for plots and regression.
Upvotes: 3
Views: 701
Reputation: 886928
We need the anonymous function
egp %>%
mutate(across(contains("EGP"),
.fns = list(professional = ~ case_when(. %in% c("I", "II") ~ 1,
. %in% c("IIIa", "IIIb", "V", "VI", "VIIa", "VIIb") ~ 0,
T ~ NA_real_),
routine_non_manual =~ case_when(. %in% c("IIIa", "IIIb", "V") ~ 1,
. %in% c("I", "II", "VI", "VIIa", "VIIb") ~ 0,
T ~ NA_real_),
manual = ~ case_when(. %in% c("VI", "VIIa", "VIIb") ~ 1,
. %in% c("I", "II", "IIIa", "IIIb", "V") ~ 0,
T ~ NA_real_)),
.names = "{fn}_{col}" ))
Upvotes: 6