tumultous_rooster
tumultous_rooster

Reputation: 12570

Sum by months of the year with decades of data in R

I have a dataframe with some monthly data for 2 decades:

year    month   value
1960    January  925
1960    February 903
1960    March    1006
    ...
1969    December 892
1970    January  990
1970    February 866
    ...
1979    December 120

I would like to create a dataframe where I sum up the totals, for each decade, by month, as follows:

year        month    value
decade_60s  January  4012
decade_60s  February 8678
decade_60s  March    9317
    ...
decade_60s  December 3995
decade_70s  January  8005
decade_70s  February 9112
    ...
decade_70s  December 325

I have been looking at the aggregate function, but this doesn't appear to be the right option.
I looked instead at some careful subsetting using the which function but this quickly became too messy.

For this kind of problem, what would be the correct approach? Will I need to use apply at some point, and if so, how?

I feel the temptation to use a for loop growing but I don't think this would be the best way to improve my skills in R..

Thanks for the advice.

PS: The month value is an ordinal factor, if this matters.

Upvotes: 0

Views: 783

Answers (2)

hrbrmstr
hrbrmstr

Reputation: 78832

plyr's count + gsub are definitely your friends here:

library(plyr)

dat <- structure(list(year = c(1960L, 1960L, 1960L, 1969L, 1970L, 1970L, 1979L),
                      month = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 1L), 
                      .Label = c("December", "February", "January", "March"), 
                      class = "factor"), 
                      value = c(925L, 903L, 1006L, 892L, 990L, 866L, 120L)), 
                      .Names = c("year", "month", "value"), 
                      class = "data.frame", row.names = c(NA, -7L))

dat$decade <- gsub("[0-9]$", "0", dat$year)

count(dat, .(decade, month), wt_var=.(value))
##  decade    month freq
## 1   1960 December  892
## 2   1960 February  903
## 3   1960  January  925
## 4   1960    March 1006
## 5   1970 December  120
## 6   1970 February  866
## 7   1970  January  990

Upvotes: 1

user20650
user20650

Reputation: 25914

Aggregate is a way to go using base R

First define the decade

yourdata$decade <- cut(yourdata$year, breaks=c(1960,1970,1980), labels=c(60,70), 
                                           include.lowest=TRUE, right=FALSE)

Then aggregate the data

aggregate(value ~ decade + month, data=yourdata , sum) 

Then order to get required output

Upvotes: 1

Related Questions