Reputation: 12570
I have a dataframe with some monthly data for 2 decades:
year month value
1960 January 925
1960 February 903
1960 March 1006
...
1969 December 892
1970 January 990
1970 February 866
...
1979 December 120
I would like to create a dataframe where I sum up the totals, for each decade, by month, as follows:
year month value
decade_60s January 4012
decade_60s February 8678
decade_60s March 9317
...
decade_60s December 3995
decade_70s January 8005
decade_70s February 9112
...
decade_70s December 325
I have been looking at the aggregate
function, but this doesn't appear to be the right option.
I looked instead at some careful subsetting using the which
function but this quickly became too messy.
For this kind of problem, what would be the correct approach? Will I need to use apply
at some point, and if so, how?
I feel the temptation to use a for
loop growing but I don't think this would be the best way to improve my skills in R..
Thanks for the advice.
PS: The month
value is an ordinal factor, if this matters.
Upvotes: 0
Views: 783
Reputation: 78832
plyr
's count
+ gsub
are definitely your friends here:
library(plyr)
dat <- structure(list(year = c(1960L, 1960L, 1960L, 1969L, 1970L, 1970L, 1979L),
month = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 1L),
.Label = c("December", "February", "January", "March"),
class = "factor"),
value = c(925L, 903L, 1006L, 892L, 990L, 866L, 120L)),
.Names = c("year", "month", "value"),
class = "data.frame", row.names = c(NA, -7L))
dat$decade <- gsub("[0-9]$", "0", dat$year)
count(dat, .(decade, month), wt_var=.(value))
## decade month freq
## 1 1960 December 892
## 2 1960 February 903
## 3 1960 January 925
## 4 1960 March 1006
## 5 1970 December 120
## 6 1970 February 866
## 7 1970 January 990
Upvotes: 1
Reputation: 25914
Aggregate is a way to go using base R
First define the decade
yourdata$decade <- cut(yourdata$year, breaks=c(1960,1970,1980), labels=c(60,70),
include.lowest=TRUE, right=FALSE)
Then aggregate the data
aggregate(value ~ decade + month, data=yourdata , sum)
Then order to get required output
Upvotes: 1