Reputation: 3594
I don't know if I am not searching with the right terms but I can't find a post about this.
I have a df :
df <- data.frame(grouping_letter = c('A', 'A', 'B', 'B', 'C', 'C'), grouping_animal = c('Cat', 'Dog', 'Cat', 'Dog', 'Cat', 'Dog'), value = c(1,2,3,4,5,6))
I want to group by grouping_letter
and by grouping_animal
. I want to do this using dplyr
.
If I did it separately, it would be :
df %>% group_by(grouping_letter) %>% summarise(sum(value))
df %>% group_by(grouping_animal) %>% summarise(sum(value))
Now let's say, I have hundreds of columns I need to group by individually. How can I do this?
I was trying:
results <- NULL
for (i in grouping_columns) {
results[[i]] <- df %>% group_by(df$i) %>% summarize(sum(value))
}
I got a list called results with the output. I am wondering if there is a better way to do this instead of using a for-loop?
Upvotes: 1
Views: 1232
Reputation: 887018
We can create an index of 'grouping' colums (using grep
), loop over the index (with lapply
) and separately get the sum
of 'value' after grouping by the column in the 'index'.
library(dplyr)
i1 <- grep('grouping', names(df))
lapply(i1, function(i)
df[setdiff(seq_along(df), i)] %>%
group_by_(.dots=names(.)[1]) %>%
summarise(Sumvalue= sum(value)))
#[[1]]
#Source: local data frame [2 x 2]
# grouping_animal Sumvalue
# (fctr) (dbl)
#1 Cat 9
#2 Dog 12
#[[2]]
#Source: local data frame [3 x 2]
# grouping_letter Sumvalue
# (fctr) (dbl)
#1 A 3
#2 B 7
#3 C 11
Or we can do this by converting the dataset from 'wide' to 'long' format, then group by the concerned columns and get the sum
of 'value'.
library(tidyr)
gather(df, Var, Group, -value) %>%
group_by(Var, Group) %>%
summarise(Sumvalue = sum(value))
# Var Group Sumvalue
# (chr) (chr) (dbl)
#1 grouping_animal Cat 9
#2 grouping_animal Dog 12
#3 grouping_letter A 3
#4 grouping_letter B 7
#5 grouping_letter C 11
Upvotes: 2