MikeTP
MikeTP

Reputation: 7986

R: rolling/moving average split by month

I am trying to calculate a rolling/moving average by month. For example, using the economics data from the ggplot2 package, I want to construct an accompanying time series that represents the 3-year rolling average of each month.

library(ggplot2)
df = economics
df$month =  as.POSIXlt(df$date)$mon+1

I get exactly what I want when I split into monthly data manually:

library(zoo)
df.test = subset(df, month==1)
df.test$uempmed.ma = rollapply(df.test$unemploy,  width=3, FUN=mean, na.rm=T, 
                               fill=NA, align="right") 


  head(df.test)

         date   pce    pop psavert uempmed unemploy year month uempmed.ma
8  1968-01-31 534.7 199920     9.5     4.5     3001 1968     1         NA
20 1969-01-31 590.2 201881     6.5     4.9     2692 1969     1         NA
32 1970-01-31 635.7 204008     8.1     4.5     3453 1970     1   3048.667
44 1971-01-31 681.3 206668     9.9     6.3     4903 1971     1   3682.667
56 1972-01-31 738.4 209061     9.4     6.6     4928 1972     1   4428.000
68 1973-01-31 828.5 211120     9.5     5.2     4452 1973     1   4761.000

But, when I try to use the plyr package to do all months simultaneously ....

library(plyr)
df2 = ddply(df, .(month), mutate,
            uempmed.ma = rollapply(df$uempmed,  3, FUN=mean, na.rm=T, 
                                   fill=NA, align="right") 
)

....the following error is returned:

Error: wrong result size (478), expected 40 or 1

I know this should be fairly easy but, I am stumped.

Ultimately, I want the moving average series (i.e. uempmed.ma) to be lagged -- That is, not include the current year in the calculation. For example, the value for 1971-01-31 from above should be the average of uempmed for the 1968-01-31, 1969-01-31, and 1970-01-31 time periods.

Any assistance would be greatly appreciated.

Upvotes: 3

Views: 2179

Answers (1)

David Arenburg
David Arenburg

Reputation: 92282

dplyr (the new generation of plyr) seem to work

library(zoo)
library(dplyr)
df %>%
  group_by(month) %>%
  mutate(uempmed.ma = rollapply(unemploy, width = 3, 
         FUN = mean, na.rm = TRUE, fill = NA, align = "right"))

You can do it quite efficiently using base R too (certainly more efficiently than using plyr)

transform(df, uempmed.ma = ave(unemploy, month, 
                               FUN = function(x) rollapply(x, width = 3, 
                                                 FUN = mean, na.rm = TRUE, 
                                                 fill = NA, align = "right")))

Or a bonus solution: using data.table package

library(data.table)
setDT(df)[, uempmed.ma := rollapply(unemploy,  width = 3, 
                          FUN = mean, na.rm = TRUE, fill = NA, 
                          align = "right"), 
          by = month]

Upvotes: 7

Related Questions