tjr
tjr

Reputation: 681

Aggregate and calculate in one step in r

I am trying to use aggregate in R to summarize some data while also calculating values for an additional column using the following data

newdata

Year    HNo County  ST  Month   Day DuckBag GooseBag
2012    264120547   LA  ND  10  13  6   0
2008    264080047   EDDY    ND  9   27  4   1
2013    26430119    ROLETTE ND  10  20  3   0
2006    264060447   BURKE   ND  10  25  5   0
2006    264061113   BENSON  ND  10  2   3   1
2012    564120139   OLIVER  ND  12  15  0   3
2013    26430294    TOWNER  ND  10  10  2   0
2007    564070298   LOGAN   ND  9   29  0   0
2007    564070869   SHERIDAN    ND  10  21  0   0
2007    564070315   CASS    ND  9   2   0   0
2005    264050791   SHERIDAN    ND  10  15  3   0
2012    264120240   RAMSEY  ND  11  1   6   0
2013    26431021    TOWNER  ND  10  20  3   0
2013    56430774    NA  ND  10  9   5   2
2006    264061288   BENSON  ND  10  4   5   1
2005    264051006   EDDY    ND  10  17  5   2
2010    264100848   MORTON  ND  10  2   0   0
2011    264110151   CASS    ND  10  8   4   1
2005    264051100   WARD    ND  10  9   1   0
2013    26430194    MC  ND  11  1   5   0

I would like to aggregate on Year and Month with a sum of DuckBag and GooseBag for each combination. In addition, I would like a count of how many rows in each Year/Month combination had a DuckBag or GooseBag >0.

I can get close with these pieces of code, but not quite what I want.

aggregate(newdata$DuckBag,list(Year = newdata$Year, Month = newdata$Month),sum)
aggregate(DuckBag ~ Year+Month,data = newdata,FUN=function(newdata) c(total =sum(newdata), n=length(newdata) ) )

Would dplyr work better? The dplyr code I have looked at seems cleaner, but not sure where to start with counts. Finally, though I'm sure it's asking too much, is there a way I can add a column that gives me the proportion of the appropriate Year's total sum represented by that Year/Month sum? Thank you so much.

Upvotes: 2

Views: 1331

Answers (3)

akrun
akrun

Reputation: 887088

You could do this in aggregate in one step also.

f1 <- function(x) c(Sum=sum(x), Positive=sum(x > 0)) #(just to make it clean)
res <- do.call(data.frame,aggregate(cbind(DuckBag,GooseBag)~Year+
                            Month, df, FUN=f1))
res
#   Year Month DuckBag.Sum DuckBag.Positive GooseBag.Sum GooseBag.Positive
#1  2007     9           0                0            0                 0
#2  2008     9           4                1            1                 1
#3  2005    10           9                3            2                 1
#4  2006    10          13                3            2                 2
#5  2007    10           0                0            0                 0
#6  2010    10           0                0            0                 0
#7  2011    10           4                1            1                 1
#8  2012    10           6                1            0                 0
#9  2013    10          13                4            2                 1
#10 2012    11           6                1            0                 0
#11 2013    11           5                1            0                 0
#12 2012    12           0                0            3                 1

Upvotes: 3

JasonAizkalns
JasonAizkalns

Reputation: 20463

Here's my take:

library(dplyr)
results <- df %>%
              group_by(Year, Month) %>%
              summarise(Duck.Bag.Total  = sum(DuckBag),
                        Goose.Bag.Total = sum(GooseBag),
                        Total.Sum       = sum(Duck.Bag.Total, Goose.Bag.Total)) %>%
              mutate(Duck.or.Goose.Positive = Duck.Bag.Total > 0 | Goose.Bag.Total > 0)

results
#    Year Month Duck.Bag.Total Goose.Bag.Total Total.Sum Duck.or.Goose.Positive
# 1  2005    10              9               2        11                   TRUE
# 2  2006    10             13               2        15                   TRUE
# 3  2007     9              0               0         0                  FALSE
# 4  2007    10              0               0         0                  FALSE
# 5  2008     9              4               1         5                   TRUE
# 6  2010    10              0               0         0                  FALSE
# 7  2011    10              4               1         5                   TRUE
# 8  2012    10              6               0         6                   TRUE
# 9  2012    11              6               0         6                   TRUE
# 10 2012    12              0               3         3                   TRUE
# 11 2013    10             13               2        15                   TRUE
# 12 2013    11              5               0         5                   TRUE

Part II:

results2 <- results %>%
              group_by(Year) %>%
              summarise(Total.for.Year = sum(Total.Sum)) %>%
              mutate(prop = Total.for.Year / sum(Total.for.Year))

results2
#   Year Total.for.Year       prop
# 1 2005             11 0.15492958
# 2 2006             15 0.21126761
# 3 2007              0 0.00000000
# 4 2008              5 0.07042254
# 5 2010              0 0.00000000
# 6 2011              5 0.07042254
# 7 2012             15 0.21126761
# 8 2013             20 0.28169014

Upvotes: 2

talat
talat

Reputation: 70266

Like this maybe?

group_by(df, Year, Month) %>%
  summarise_each(funs(Sum = sum(.), Positive = sum(. > 0)), DuckBag, GooseBag)
#Source: local data frame [12 x 6]
#Groups: Year
#
#   Year Month DuckBag_Sum GooseBag_Sum DuckBag_Positive GooseBag_Positive
#1  2005    10           9            2                3                 1
#2  2006    10          13            2                3                 2
#3  2007     9           0            0                0                 0
#4  2007    10           0            0                0                 0
#5  2008     9           4            1                1                 1
#6  2010    10           0            0                0                 0
#7  2011    10           4            1                1                 1
#8  2012    10           6            0                1                 0
#9  2012    11           6            0                1                 0
#10 2012    12           0            3                0                 1
#11 2013    10          13            2                4                 1
#12 2013    11           5            0                1                 0

Upvotes: 7

Related Questions