efz
efz

Reputation: 435

Summarise multiple columns by the value of another column

say I have a dataframe like:

 
country   tot.subcat cat                region_ar6_5               region_ar6_10              region_ar6_22 region_ar6_dev
  ABW     2.713934   2 Latin America and Caribbean Latin America and Caribbean                  Caribbean     developing
  ABW   955.775809   1 Latin America and Caribbean Latin America and Caribbean                  Caribbean     developing
  AFG  9508.812918   1 Asia and Developing Pacific               Southern Asia      Rest of Southern Asia            ldc
  AFG    47.666700   3 Asia and Developing Pacific               Southern Asia      Rest of Southern Asia            ldc
  AFG   236.350040   2 Asia and Developing Pacific               Southern Asia      Rest of Southern Asia            ldc
  AGO 30832.514020   1      Africa and Middle East                      Africa Southern and middle Africa            ldc

and I want to calculate the sum of tot.subcat for each group defined by cat and by the regions region_ar6_5, region_ar6_10 ,region_ar6_22, region_ar6_dev. The result should be the same as when calculating each summation independently

dataframe %>% group_by(cat,region_ar6_5) %>% summarise(sum_region_ar6_5=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_10) %>% summarise(sum_region_ar6_10=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_22) %>% summarise(sum_region_ar6_22=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_dev) %>% summarise(sum_region_ar6_dev=sum(tot.subcat))

but I was looking for a more efficient solution. I tried with group_by_at(vars(matches('^region')), cat) %>% summarise(across(everything(), list(sum(tot.subcat)))) but doesn't work. the output of dput for the first 200 lines is here. any help appreciated

Upvotes: 1

Views: 78

Answers (2)

akrun
akrun

Reputation: 887118

As we are using different grouping variables, an option is to map over those variables

library(purrr)
library(dplyr)
library(stringr)
region_vars <- grep('region', names(dataframe), value = TRUE)
lst_out <- map(region_vars, ~ dataframe %>%
                  group_by(across(all_of(c('cat', .x)))) %>%
                  summarise(!! str_c("sum_", .x) := 
         sum(tot.subcat, na.rm = TRUE), .groups = 'drop'))

The output is a list of summarised output. If we want a single dataset output, the list can be reduced to a single dataset by joining

lst_out %>%
     reduce(full_join, by = 'cat')

data

dataframe <- structure(list(country = c("ABW", "ABW", "AFG", "AFG", "AFG", 
"AGO"), tot.subcat = c(2.713934, 955.775809, 9508.812918, 47.6667, 
236.35004, 30832.51402), cat = c(2L, 1L, 1L, 3L, 2L, 1L), 
region_ar6_5 = c("Latin America and Caribbean", 
"Latin America and Caribbean", "Asia and Developing Pacific", 
"Asia and Developing Pacific", "Asia and Developing Pacific", 
"Africa and Middle East"), region_ar6_10 = c("Latin America and Caribbean", 
"Latin America and Caribbean", "Southern Asia", "Southern Asia", 
"Southern Asia", "Africa"), region_ar6_22 = c("Caribbean", "Caribbean", 
"Rest of Southern Asia", "Rest of Southern Asia", "Rest of Southern Asia", 
"Southern and middle Africa"), region_ar6_dev = c("developing", 
"developing", "ldc", "ldc", "ldc", "ldc")),
class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

Duck
Duck

Reputation: 39595

Try something like this using reshaping:

library(tidyverse)
#Code
new <- dataframe %>% select(c(cat,starts_with('region'),tot.subcat)) %>%
  pivot_longer(-c(cat,tot.subcat)) %>%
  group_by(cat,name,value) %>%
  summarise(Total=sum(tot.subcat,na.rm=T)) %>%
  pivot_wider(names_from = name,values_from=Total,names_prefix = 'Total_')

Upvotes: 0

Related Questions