rcj001
rcj001

Reputation: 13

How can I sum a column in a dataframe after a group_by?

I would like to create a dataframe using the group_by function and then sum a column based on the group_by. So far, I've only been able to sum the entire column rather than within the group.

I have a dataframe:

old_df <- data_frame(category1 = c("a", "a", "b", "b"),
                     category2 = c("2", "1", "3", "4"))

From here, I would like to group_by category1 ("a" and "b") and sum category2 for "a" and "b" individually. It would look like this:

new_df <- data_frame(category1 = c("a", "b"),
                     Sum_category2 = c("3", "7"))

I've tried a few things, and I thought this one below should work.

new_df <- old_df %>%
 group_by(category1) %>%
 summarize(Sum_category2 = sum(category2))

Everything I've tried so far just sums up the entire category2 column, which in this case would equal 10. How can I make it sum only within the grouping?

Upvotes: 0

Views: 80

Answers (1)

user1357015
user1357015

Reputation: 11696

I'm not sure why you're using strings for category 2 but the following works just fine.

library(dplyr)

old_df <- data.frame(category1 = c("a", "a", "b", "b"),
                 category2 = c(2, 1, 3, 4))

old_df %>% group_by(category1) %>% summarize(sum_category = sum(category2))

old_df
    # A tibble: 2 x 2
  category1 sum_category
  <fct>            <dbl>
1 a                    3
2 b                    7

Upvotes: 1

Related Questions