Batch column aggregation and reordering dataframe in R

Question

I have census tract data divided my age variables by sex, into a value for males (varname_m) and females (varname_f):

Rows: 146,112
Columns: 13
$ tractid     "01001020100", "01001020100", "01001020200", "01001020200", "01001020300", "01001020300", "01001020400", "01001020400", "01001020500", "01001020500", "0100102060…
$ ag18to19_m  37, 57, 24, 15, 49, 27, 87, 33, 293, 159, 57, 40, 19, 41, 18, 56, 143, 86, 25, 155, 41, 7, 40, 0, 35, 0, 99, 25, 190, 420, 61, 157, 63, 110, 37, 127, 67, 45, 198…
$ ag20_m      6, 14, 64, 0, 11, 18, 16, 8, 115, 21, 42, 15, 53, 71, 16, 0, 63, 77, 43, 96, 32, 15, 21, 0, 12, 44, 8, 0, 105, 80, 34, 20, 8, 0, 13, 46, 88, 0, 83, 241, 10, 96, …
$ ag21_m      18, 0, 15, 7, 0, 16, 117, 18, 14, 40, 23, 26, 45, 47, 32, 0, 41, 50, 0, 76, 14, 45, 20, 1, 48, 11, 11, 30, 18, 30, 60, 55, 20, 0, 28, 43, 31, 21, 9, 0, 11, 8, 0,…
$ ag22to24_m  48, 64, 109, 45, 25, 62, 65, 41, 224, 531, 28, 51, 31, 60, 0, 24, 132, 96, 59, 98, 27, 45, 111, 30, 113, 58, 71, 61, 46, 114, 11, 86, 116, 99, 28, 158, 72, 135, …
$ ag25to29_m  49, 31, 83, 99, 87, 144, 153, 142, 428, 327, 69, 35, 36, 22, 61, 113, 202, 420, 184, 255, 94, 84, 118, 82, 71, 30, 47, 195, 44, 135, 118, 150, 215, 157, 118, 180…
$ ag30to34_m  52, 72, 59, 97, 84, 157, 124, 85, 415, 227, 95, 13, 105, 202, 37, 86, 274, 334, 161, 182, 91, 173, 84, 84, 81, 106, 79, 67, 263, 77, 40, 115, 199, 411, 81, 115, …
$ ag18to19_f  33, 8, 51, 7, 31, 19, 107, 15, 33, 25, 47, 37, 35, 81, 98, 92, 127, 147, 72, 0, 109, 57, 7, 74, 78, 0, 36, 24, 109, 268, 88, 62, 10, 0, 47, 33, 79, 191, 63, 134,…
$ ag20_f      13, 40, 23, 18, 27, 18, 12, 11, 37, 0, 58, 83, 19, 45, 20, 77, 16, 103, 0, 36, 15, 0, 8, 37, 29, 34, 36, 0, 23, 30, 37, 0, 10, 48, 51, 67, 17, 15, 125, 55, 27, 1…
$ ag21_f      40, 6, 13, 24, 36, 0, 16, 19, 17, 0, 11, 0, 0, 89, 28, 31, 39, 20, 15, 0, 7, 13, 0, 17, 9, 13, 17, 47, 106, 36, 42, 94, 0, 13, 19, 50, 67, 0, 122, 48, 21, 9, 145…
$ ag22to24_f  21, 67, 71, 21, 69, 35, 28, 165, 346, 350, 15, 0, 53, 50, 25, 42, 207, 165, 158, 114, 20, 0, 73, 66, 29, 29, 59, 39, 83, 94, 22, 24, 79, 69, 37, 21, 73, 201, 282…
$ ag25to29_f  36, 24, 86, 51, 88, 160, 130, 73, 318, 539, 157, 127, 128, 111, 86, 29, 334, 365, 87, 217, 57, 60, 177, 92, 17, 90, 86, 113, 67, 204, 136, 120, 130, 108, 211, 51…
$ ag30to34_f  36, 73, 38, 42, 87, 154, 63, 84, 440, 414, 51, 95, 151, 73, 27, 70, 429, 458, 231, 173, 54, 82, 104, 24, 61, 159, 69, 30, 218, 82, 88, 214, 222, 158, 76, 125, 24…

I want to aggregate each of the variables divided by sex to a single combined variable. For example, I want to add ag18to19_m and ag18to19_f to create ag18to19. I can easily do this using mutate and the following code and order them to the front of the data frame:

aggregated <- merged %>% 
  mutate(ag18to19 = ag18to19_m + ag18to19_f) %>% 
  relocate(ag18to19, .before = ag18to19_m)  %>% 
  
  mutate(ag20 = ag20_m + ag20_f) %>% 
  relocate(ag20, .before = ag20_m)  %>% 
  
  mutate(ag21 = ag21_m + ag21_f) %>% 
  relocate(ag21, .before = ag21_m)  %>% 
  
  mutate(ag22to24 = ag22to24_m + ag22to24_f) %>% 
  relocate(ag22to24, .before = ag22to24_m)  %>% 
  
  mutate(ag25to29 = ag25to29_m + ag25to29_f) %>% 
  relocate(ag25to29, .before = ag25to29_m)  %>% 
  
  mutate(ag30to34 = ag30to34_m + ag30to34_f) %>% 
  relocate(ag30to34, .before = ag30to34_m)

I know there's a more efficient way to do this using a loop or map_df function that will also give me a data frame as an output. I've been trying for the last hour to write a function and use map_df but haven't had any success. Does anyone have a suggestion?

More efficient code here is best practice and will help me apply this same data cleaning step to several other variables that are grouped in the same way (e.g., income grouped by sex or education grouped by age).

Any help would be greatly appreciated. Thanks.

akrun · Accepted Answer

Here is an option in tidyverse

library(dplyr)
library(stringr)
merged1 <- merged %>% 
     mutate(across(ends_with('_m'), ~ 
                   . + get(str_replace(cur_column(), '_m', '_f')),
                .names = '{.col}_new')) %>%
       rename_at(vars(ends_with('_new')),
              ~ str_remove(., '_[m]_new$')) %>%
       select(tract_id, order(names(.)[-1]) + 1)

Batch column aggregation and reordering dataframe in R

Answers (1)

Related Questions