Ran K
Ran K

Reputation: 162

Summing together columns using mapping from a diffrent dataset

I would like to sum together different columns using a mapping from an another dataframe.

This is a simplified version of my datasets:

df_raw = data.frame(grp1_1 = c(1,2,1,2,1,2,1,2), grp1_2 = c(1,1,1,5,3,1,1,6),
                  grp2_1 = c(1,1,2,3,5,6,2,1), grp2_2 = c(1,1,2,3,4,5,6,7),
                  grp3_1 = c(1,2,0,9,8,0,1,2))

df_groups = data.frame(cols = c("grp1_1","grp1_2","grp2_1","grp2_2","grp3_1"),
                       group = c("group1","group1","group2","group2","group3"))

The end result should be:

# A tibble: 8 x 3
# Rowwise: 
  group1 group2 group3
   <dbl>  <dbl>  <dbl>
1      2      2      1
2      3      2      2
3      2      4      0
4      7      6      9
5      4      9      8
6      3     11      0
7      2      8      1
8      8      8      2

I already understand I would have to use transmute,rowwise and probably c_across in order to reach the desired output, something like this:

df_raw %>%
  rowwise() %>%  
  transmute(group1 =  sum(c_across(df_groups$cols[df_groups$group == "group1"])) ,
            group2 =  sum(c_across(df_groups$cols[df_groups$group == "group2"])),
            group3 =  sum(c_across(df_groups$cols[df_groups$group == "group3"])))

There are two main problems here:

  1. I need to pass a string from df_groups as a new variable name in the processed df.
  2. There has to be a way to create this mapping without repeating for each group separately, as I have 100 different columns that should be grouped into 30 groups.

Thank you.

Upvotes: 2

Views: 38

Answers (1)

akrun
akrun

Reputation: 887691

We may split the df_groups and loop over the list

library(purrr)
library(dplyr)
imap(split(df_groups$cols, df_groups$group), ~ 
   df_raw %>% 
    transmute(!! .y := rowSums(across(all_of(.x))))) %>%
    list_cbind

-output

  group1 group2 group3
1      2      2      1
2      3      2      2
3      2      4      0
4      7      6      9
5      4      9      8
6      3     11      0
7      2      8      1
8      8      8      2

Or another option is to reshape to 'long', join with the df_groups and reshape back to wide

library(tidyr)
df_raw %>%
   mutate(rn = row_number()) %>%
   pivot_longer(cols = -rn, names_to = 'cols') %>% 
   left_join(df_groups) %>% 
   dplyr::select(-cols) %>%
   pivot_wider(names_from = group, values_from = value, values_fn = sum) %>% 
   dplyr::select(-rn)

-output

# A tibble: 8 × 3
  group1 group2 group3
   <dbl>  <dbl>  <dbl>
1      2      2      1
2      3      2      2
3      2      4      0
4      7      6      9
5      4      9      8
6      3     11      0
7      2      8      1
8      8      8      2

Upvotes: 2

Related Questions