conor
conor

Reputation: 1314

Total of group_by summarize values

I have data and a plot like the example I give below.

I'd like to have a third "Condition" that is the Total Sum of Amount for Condition A and Condition B for a given Year and Month. I don't know how to do that since Condition is included in the group_by statement. In particular, I'd like to be able to plot it on the same plot as what appears below (so there'd be a third line for each Year showing the Total).

library(ggplot2)
library(dplyr)
data <- data.frame(Amount = sample(1:100, replace=T), 
               Condition = sample(c("A","B"), 100, replace=T),
               Year = sample(2015:2017, 100, replace=T),
               Month = sample(1:12, 100, replace=T))
dataGrouped <- data %>%
               group_by(Year, Month, Condition) %>%
               summarize(sumAmount = sum(Amount))
ggplot(dataGrouped, aes(x=Month, y=sumAmount, color=factor(Year), linetype=Condition)) + 
    geom_line(size=1) + scale_x_continuous(breaks = 1:12)

enter image description here

I've considered first doing a group_by(Year, Month), then adding a Total, but still not sure what way would be best to do this (or if there's a better alternative).

Upvotes: 3

Views: 7709

Answers (2)

bmosov01
bmosov01

Reputation: 599

Here's a dplyr solution that summarizes the total by Year and Month and then binds it to the grouped data with a Condition value of "Total", so that ggplot() will pick it up as a new line in your plot.

library(ggplot2)
library(dplyr)

data <- data.frame(Amount = sample(1:100, replace=T), 
                   Condition = sample(c("A","B"), 100, replace=T),
                   Year = sample(2015:2017, 100, replace=T),
                   Month = sample(1:12, 100, replace=T))

dataGrouped <- data %>%
  group_by(Year, Month, Condition) %>%
  summarize(sumAmount = sum(Amount))

ggplot(dataGrouped, aes(x=Month, y=sumAmount, color=factor(Year), linetype=Condition)) + 
  geom_line(size=1) + scale_x_continuous(breaks = 1:12)

dataWithTotal <- data %>%
  group_by( Year, Month ) %>%
  summarize( sumAmount = sum(Amount) ) %>%
  mutate( Condition = 'Total' ) %>%
  ungroup() %>%
  rbind( ungroup(dataGrouped) ) %>%
  mutate( Condition = as.factor(Condition) )

ggplot(dataWithTotal, aes(x=Month, y=sumAmount, color=factor(Year), linetype=Condition)) + 
  geom_line(size=1) + scale_x_continuous(breaks = 1:12)

Upvotes: 4

Adam Quek
Adam Quek

Reputation: 7153

Using reshape2 melt and dcast to reform the wide format for data manipulation (to form condition C):

library(reshape2)
data <- data %>% 
        mutate_at(vars(Condition, Year, Month), .funs= funs(as.factor))
dat <- melt(data) %>% 
       dcast(., Year + Month ~ Condition, sum)
dat <- dat %>% 
       mutate(C = A + B) %>% 
       mutate(Month = as.numeric(as.character(Month)))

Form long format with gather:

dat <- dat %>% 
       gather(Condition, Amount, A:C)

Plot:

ggplot(dat, aes(Month, Amount,color=factor(Year), linetype=Condition)) + 
      geom_line() + scale_x_continuous(breaks = 1:12)

enter image description here

Upvotes: 1

Related Questions