Reputation: 435
say I have a dataframe
like:
country tot.subcat cat region_ar6_5 region_ar6_10 region_ar6_22 region_ar6_dev
ABW 2.713934 2 Latin America and Caribbean Latin America and Caribbean Caribbean developing
ABW 955.775809 1 Latin America and Caribbean Latin America and Caribbean Caribbean developing
AFG 9508.812918 1 Asia and Developing Pacific Southern Asia Rest of Southern Asia ldc
AFG 47.666700 3 Asia and Developing Pacific Southern Asia Rest of Southern Asia ldc
AFG 236.350040 2 Asia and Developing Pacific Southern Asia Rest of Southern Asia ldc
AGO 30832.514020 1 Africa and Middle East Africa Southern and middle Africa ldc
and I want to calculate the sum of tot.subcat
for each group defined by cat
and by the regions region_ar6_5, region_ar6_10 ,region_ar6_22, region_ar6_dev
. The result should be the same as when calculating each summation independently
dataframe %>% group_by(cat,region_ar6_5) %>% summarise(sum_region_ar6_5=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_10) %>% summarise(sum_region_ar6_10=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_22) %>% summarise(sum_region_ar6_22=sum(tot.subcat))
dataframe %>% group_by(cat,region_ar6_dev) %>% summarise(sum_region_ar6_dev=sum(tot.subcat))
but I was looking for a more efficient solution. I tried with group_by_at(vars(matches('^region')), cat) %>% summarise(across(everything(), list(sum(tot.subcat))))
but doesn't work. the output of dput for the first 200 lines is here. any help appreciated
Upvotes: 1
Views: 78
Reputation: 887118
As we are using different grouping variables, an option is to map
over those variables
library(purrr)
library(dplyr)
library(stringr)
region_vars <- grep('region', names(dataframe), value = TRUE)
lst_out <- map(region_vars, ~ dataframe %>%
group_by(across(all_of(c('cat', .x)))) %>%
summarise(!! str_c("sum_", .x) :=
sum(tot.subcat, na.rm = TRUE), .groups = 'drop'))
The output is a list
of summarised output. If we want a single dataset output, the list
can be reduce
d to a single dataset by joining
lst_out %>%
reduce(full_join, by = 'cat')
dataframe <- structure(list(country = c("ABW", "ABW", "AFG", "AFG", "AFG",
"AGO"), tot.subcat = c(2.713934, 955.775809, 9508.812918, 47.6667,
236.35004, 30832.51402), cat = c(2L, 1L, 1L, 3L, 2L, 1L),
region_ar6_5 = c("Latin America and Caribbean",
"Latin America and Caribbean", "Asia and Developing Pacific",
"Asia and Developing Pacific", "Asia and Developing Pacific",
"Africa and Middle East"), region_ar6_10 = c("Latin America and Caribbean",
"Latin America and Caribbean", "Southern Asia", "Southern Asia",
"Southern Asia", "Africa"), region_ar6_22 = c("Caribbean", "Caribbean",
"Rest of Southern Asia", "Rest of Southern Asia", "Rest of Southern Asia",
"Southern and middle Africa"), region_ar6_dev = c("developing",
"developing", "ldc", "ldc", "ldc", "ldc")),
class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1
Reputation: 39595
Try something like this using reshaping:
library(tidyverse)
#Code
new <- dataframe %>% select(c(cat,starts_with('region'),tot.subcat)) %>%
pivot_longer(-c(cat,tot.subcat)) %>%
group_by(cat,name,value) %>%
summarise(Total=sum(tot.subcat,na.rm=T)) %>%
pivot_wider(names_from = name,values_from=Total,names_prefix = 'Total_')
Upvotes: 0