How to recode multiple variables with different values for each variable

Question

I have a survey dataset that contains 100+ variables and almost all have 1-10 code values. The code values per column is provided in another df.

Example data:


survey_df = structure(list(resp_id = 1:5, gender = c("1", "2", "2", "1", 
"1"), state = c("1", "2", "3", "1", "4"), education = c("1", 
"1", "1", "2", "2")), class = "data.frame", row.names = c(NA, 
-5L))

coded_df = structure(list(col = c("state", "gender", "education"), col_values = c("1-CA,2-TX,3-AZ,4-CO", 
"1-Male,2-Female", "1-High School,2-Bachelor")), class = "data.frame", row.names = c(NA, 
-3L))

Since the survey columns change over time/product I wanted to avoid any hard coded recodes and hence have a function that will take input a column name and return a "named vector" from coded_df.

get_named_vec <- function(x) {
  tmp_chr <- coded_df %>%
    filter(col == x) %>%
    mutate(col_values = str_replace_all(col_values, "\n", "")) %>%
    separate_rows(col_values, sep = ",") %>%
    separate(col_values, into = c("var1", "var2"), sep = "-") %>%
    mutate(var1 = as.character(as.numeric(var1)), 
           var2 = str_trim(var2)) %>%
    pull(var2, var1)

  return(tmp_chr)
  
}

I then use the named vector as follows to update the survey_df.

survey_df%>%
 mutate(gender = recode(gender,!!!get_named_vec("gender"),.default = "NA_character_"))

So far this work on per column basis which means 100+ executions!

But how do I make this run via mutate_at so that I selectively recode certain variables in single execution.

# This does not work.
to_update_col<-c("state","gender")
survey_df%>%
  mutate_at(.vars=all_of(to_update_col),.funs=function(x) recode(x,!!!get_named_vec(x))))

Any help much appreciated!

Thanks

Vinay

Jon Spring · Accepted Answer

I expect it will be simpler and more performant to convert this into a pivot-join-pivot operation, where you make the source and lookup tables into long format, join them, and reshape wide again.

Given this survey info:

survey_df = structure(list(resp_id = 1:5, 
                           gender = c(1L, 2L, 2L, 1L, 1L), 
                           state = c(1, 2, 3, 1, 4), 
                           education = c(1L, 1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -5L)) %>%
 mutate(across(-resp_id, as.character))

We can convert the lookup table into long form:

coded_df_long <- coded_df %>%
  separate_rows(col_values, sep = ",") %>%
  separate(col_values, c("old", "new"), extra = "merge")

Then pivot the survey long, join to the codings, and pivot wide again.

survey_df %>%
  pivot_longer(-resp_id) %>%
  left_join(coded_df_long, by = c("name" = "col", "value" = "old")) %>%
  select(-value) %>%
  pivot_wider(names_from = name, values_from = new)

Result

# A tibble: 5 x 4
  resp_id gender state education  
              
1       1 Male   CA    High School
2       2 Female TX    High School
3       3 Female AZ    High School
4       4 Male   CA    Bachelor   
5       5 Male   CO    Bachelor

How to recode multiple variables with different values for each variable

Answers (1)

Related Questions