Reputation: 681
I am trying to use aggregate in R to summarize some data while also calculating values for an additional column using the following data
newdata
Year HNo County ST Month Day DuckBag GooseBag
2012 264120547 LA ND 10 13 6 0
2008 264080047 EDDY ND 9 27 4 1
2013 26430119 ROLETTE ND 10 20 3 0
2006 264060447 BURKE ND 10 25 5 0
2006 264061113 BENSON ND 10 2 3 1
2012 564120139 OLIVER ND 12 15 0 3
2013 26430294 TOWNER ND 10 10 2 0
2007 564070298 LOGAN ND 9 29 0 0
2007 564070869 SHERIDAN ND 10 21 0 0
2007 564070315 CASS ND 9 2 0 0
2005 264050791 SHERIDAN ND 10 15 3 0
2012 264120240 RAMSEY ND 11 1 6 0
2013 26431021 TOWNER ND 10 20 3 0
2013 56430774 NA ND 10 9 5 2
2006 264061288 BENSON ND 10 4 5 1
2005 264051006 EDDY ND 10 17 5 2
2010 264100848 MORTON ND 10 2 0 0
2011 264110151 CASS ND 10 8 4 1
2005 264051100 WARD ND 10 9 1 0
2013 26430194 MC ND 11 1 5 0
I would like to aggregate on Year and Month with a sum of DuckBag and GooseBag for each combination. In addition, I would like a count of how many rows in each Year/Month combination had a DuckBag or GooseBag >0.
I can get close with these pieces of code, but not quite what I want.
aggregate(newdata$DuckBag,list(Year = newdata$Year, Month = newdata$Month),sum)
aggregate(DuckBag ~ Year+Month,data = newdata,FUN=function(newdata) c(total =sum(newdata), n=length(newdata) ) )
Would dplyr work better? The dplyr code I have looked at seems cleaner, but not sure where to start with counts. Finally, though I'm sure it's asking too much, is there a way I can add a column that gives me the proportion of the appropriate Year's total sum represented by that Year/Month sum? Thank you so much.
Upvotes: 2
Views: 1331
Reputation: 887088
You could do this in aggregate
in one step also.
f1 <- function(x) c(Sum=sum(x), Positive=sum(x > 0)) #(just to make it clean)
res <- do.call(data.frame,aggregate(cbind(DuckBag,GooseBag)~Year+
Month, df, FUN=f1))
res
# Year Month DuckBag.Sum DuckBag.Positive GooseBag.Sum GooseBag.Positive
#1 2007 9 0 0 0 0
#2 2008 9 4 1 1 1
#3 2005 10 9 3 2 1
#4 2006 10 13 3 2 2
#5 2007 10 0 0 0 0
#6 2010 10 0 0 0 0
#7 2011 10 4 1 1 1
#8 2012 10 6 1 0 0
#9 2013 10 13 4 2 1
#10 2012 11 6 1 0 0
#11 2013 11 5 1 0 0
#12 2012 12 0 0 3 1
Upvotes: 3
Reputation: 20463
Here's my take:
library(dplyr)
results <- df %>%
group_by(Year, Month) %>%
summarise(Duck.Bag.Total = sum(DuckBag),
Goose.Bag.Total = sum(GooseBag),
Total.Sum = sum(Duck.Bag.Total, Goose.Bag.Total)) %>%
mutate(Duck.or.Goose.Positive = Duck.Bag.Total > 0 | Goose.Bag.Total > 0)
results
# Year Month Duck.Bag.Total Goose.Bag.Total Total.Sum Duck.or.Goose.Positive
# 1 2005 10 9 2 11 TRUE
# 2 2006 10 13 2 15 TRUE
# 3 2007 9 0 0 0 FALSE
# 4 2007 10 0 0 0 FALSE
# 5 2008 9 4 1 5 TRUE
# 6 2010 10 0 0 0 FALSE
# 7 2011 10 4 1 5 TRUE
# 8 2012 10 6 0 6 TRUE
# 9 2012 11 6 0 6 TRUE
# 10 2012 12 0 3 3 TRUE
# 11 2013 10 13 2 15 TRUE
# 12 2013 11 5 0 5 TRUE
Part II:
results2 <- results %>%
group_by(Year) %>%
summarise(Total.for.Year = sum(Total.Sum)) %>%
mutate(prop = Total.for.Year / sum(Total.for.Year))
results2
# Year Total.for.Year prop
# 1 2005 11 0.15492958
# 2 2006 15 0.21126761
# 3 2007 0 0.00000000
# 4 2008 5 0.07042254
# 5 2010 0 0.00000000
# 6 2011 5 0.07042254
# 7 2012 15 0.21126761
# 8 2013 20 0.28169014
Upvotes: 2
Reputation: 70266
Like this maybe?
group_by(df, Year, Month) %>%
summarise_each(funs(Sum = sum(.), Positive = sum(. > 0)), DuckBag, GooseBag)
#Source: local data frame [12 x 6]
#Groups: Year
#
# Year Month DuckBag_Sum GooseBag_Sum DuckBag_Positive GooseBag_Positive
#1 2005 10 9 2 3 1
#2 2006 10 13 2 3 2
#3 2007 9 0 0 0 0
#4 2007 10 0 0 0 0
#5 2008 9 4 1 1 1
#6 2010 10 0 0 0 0
#7 2011 10 4 1 1 1
#8 2012 10 6 0 1 0
#9 2012 11 6 0 1 0
#10 2012 12 0 3 0 1
#11 2013 10 13 2 4 1
#12 2013 11 5 0 1 0
Upvotes: 7