Reputation: 7986
I am trying to calculate a rolling/moving average by month. For example, using the economics data from the ggplot2 package, I want to construct an accompanying time series that represents the 3-year rolling average of each month.
library(ggplot2)
df = economics
df$month = as.POSIXlt(df$date)$mon+1
I get exactly what I want when I split into monthly data manually:
library(zoo)
df.test = subset(df, month==1)
df.test$uempmed.ma = rollapply(df.test$unemploy, width=3, FUN=mean, na.rm=T,
fill=NA, align="right")
head(df.test)
date pce pop psavert uempmed unemploy year month uempmed.ma
8 1968-01-31 534.7 199920 9.5 4.5 3001 1968 1 NA
20 1969-01-31 590.2 201881 6.5 4.9 2692 1969 1 NA
32 1970-01-31 635.7 204008 8.1 4.5 3453 1970 1 3048.667
44 1971-01-31 681.3 206668 9.9 6.3 4903 1971 1 3682.667
56 1972-01-31 738.4 209061 9.4 6.6 4928 1972 1 4428.000
68 1973-01-31 828.5 211120 9.5 5.2 4452 1973 1 4761.000
But, when I try to use the plyr package to do all months simultaneously ....
library(plyr)
df2 = ddply(df, .(month), mutate,
uempmed.ma = rollapply(df$uempmed, 3, FUN=mean, na.rm=T,
fill=NA, align="right")
)
....the following error is returned:
Error: wrong result size (478), expected 40 or 1
I know this should be fairly easy but, I am stumped.
Ultimately, I want the moving average series (i.e. uempmed.ma
) to be lagged -- That is, not include the current year in the calculation. For example, the value for 1971-01-31 from above should be the average of uempmed
for the 1968-01-31, 1969-01-31, and 1970-01-31 time periods.
Any assistance would be greatly appreciated.
Upvotes: 3
Views: 2179
Reputation: 92282
dplyr
(the new generation of plyr
) seem to work
library(zoo)
library(dplyr)
df %>%
group_by(month) %>%
mutate(uempmed.ma = rollapply(unemploy, width = 3,
FUN = mean, na.rm = TRUE, fill = NA, align = "right"))
You can do it quite efficiently using base R too (certainly more efficiently than using plyr
)
transform(df, uempmed.ma = ave(unemploy, month,
FUN = function(x) rollapply(x, width = 3,
FUN = mean, na.rm = TRUE,
fill = NA, align = "right")))
Or a bonus solution: using data.table
package
library(data.table)
setDT(df)[, uempmed.ma := rollapply(unemploy, width = 3,
FUN = mean, na.rm = TRUE, fill = NA,
align = "right"),
by = month]
Upvotes: 7