Reputation: 253
I have a survey dataset that contains 100+ variables and almost all have 1-10 code values. The code values per column is provided in another df.
Example data:
survey_df = structure(list(resp_id = 1:5, gender = c("1", "2", "2", "1",
"1"), state = c("1", "2", "3", "1", "4"), education = c("1",
"1", "1", "2", "2")), class = "data.frame", row.names = c(NA,
-5L))
coded_df = structure(list(col = c("state", "gender", "education"), col_values = c("1-CA,2-TX,3-AZ,4-CO",
"1-Male,2-Female", "1-High School,2-Bachelor")), class = "data.frame", row.names = c(NA,
-3L))
Since the survey columns change over time/product I wanted to avoid any hard coded recodes and hence have a function that will take input a column name and return a "named vector" from coded_df.
get_named_vec <- function(x) {
tmp_chr <- coded_df %>%
filter(col == x) %>%
mutate(col_values = str_replace_all(col_values, "\\n", "")) %>%
separate_rows(col_values, sep = ",") %>%
separate(col_values, into = c("var1", "var2"), sep = "-") %>%
mutate(var1 = as.character(as.numeric(var1)),
var2 = str_trim(var2)) %>%
pull(var2, var1)
return(tmp_chr)
}
I then use the named vector as follows to update the survey_df.
survey_df%>%
mutate(gender = recode(gender,!!!get_named_vec("gender"),.default = "NA_character_"))
So far this work on per column basis which means 100+ executions!
But how do I make this run via mutate_at so that I selectively recode certain variables in single execution.
# This does not work.
to_update_col<-c("state","gender")
survey_df%>%
mutate_at(.vars=all_of(to_update_col),.funs=function(x) recode(x,!!!get_named_vec(x))))
Any help much appreciated!
Thanks
Vinay
Upvotes: 1
Views: 467
Reputation: 66415
I expect it will be simpler and more performant to convert this into a pivot-join-pivot operation, where you make the source and lookup tables into long format, join them, and reshape wide again.
Given this survey info:
survey_df = structure(list(resp_id = 1:5,
gender = c(1L, 2L, 2L, 1L, 1L),
state = c(1, 2, 3, 1, 4),
education = c(1L, 1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -5L)) %>%
mutate(across(-resp_id, as.character))
We can convert the lookup table into long form:
coded_df_long <- coded_df %>%
separate_rows(col_values, sep = ",") %>%
separate(col_values, c("old", "new"), extra = "merge")
Then pivot the survey long, join to the codings, and pivot wide again.
survey_df %>%
pivot_longer(-resp_id) %>%
left_join(coded_df_long, by = c("name" = "col", "value" = "old")) %>%
select(-value) %>%
pivot_wider(names_from = name, values_from = new)
Result
# A tibble: 5 x 4
resp_id gender state education
<int> <chr> <chr> <chr>
1 1 Male CA High School
2 2 Female TX High School
3 3 Female AZ High School
4 4 Male CA Bachelor
5 5 Male CO Bachelor
Upvotes: 1