Old Pro
Old Pro

Reputation: 25557

Compute average over sliding time interval (7 days ago/later) in R

I've seen a lot of solutions to working with groups of times or date, like aggregate to sum daily observations into weekly observations, or other solutions to compute a moving average, but I haven't found a way do what I want, which is to pluck relative dates out of data keyed by an additional variable.

I have daily sales data for a bunch of stores. So that is a data.frame with columns

store_id date sales

It's nearly complete, but there are some missing data points, and those missing data points are having a strong effect on our models (I suspect). So I used expand.grid to make sure we have a row for every store and every date, but at this point the sales data for those missing data points are NAs. I've found solutions like

dframe[is.na(dframe)] <- 0

or

dframe$sales[is.na(dframe$sales)] <- mean(dframe$sales, na.rm = TRUE)

but I'm not happy with the RHS of either of those. I want to replace missing sales data with our best estimate, and the best estimate of sales for a given store on a given date is the average of the sales 7 days prior and 7 days later. E.g. for Sunday the 8th, the average of Sunday the 1st and Sunday the 15th, because sales is significantly dependent on day of the week.

So I guess I can use

dframe$sales[is.na(dframe$sales)] <- my_func(dframe)

where my_func(dframe) replaces every stores' sales data with the average of the store's sales 7 days prior and 7 days later (ignoring for the first go around the situation where one of those data points is also missing), but I have no idea how to write my_func in an efficient way.

How do I match up the store_id and the dates 7 days prior and future without using a terribly inefficient for loop? Preferably using only base R packages.

Upvotes: 2

Views: 294

Answers (1)

thelatemail
thelatemail

Reputation: 93938

Something like:

with(
  dframe, 
    ave(sales, store_id, FUN=function(x) {
      naw <- which(is.na(x))
      x[naw] <- rowMeans(cbind(x[naw+7],x[naw-7]))
      x
    }
  )
)

Upvotes: 0

Related Questions