jls
jls

Reputation: 224

Calculate the moving average of nth data points over a time series with NAs r

I have several months of meteorological data at a resolution of 30 minutes. I need to gap fill some missing data. In excel, I use a 14-day moving average to fill gaps. Specifically, if the data gap occurs at 11:00 a.m., I use data points ONLY at 11:00 a.m. from the previous 7 days and the following 7 days to find an average. This is absurdly simple to do in Excel, but I'm forbidding myself from using it. I want an r solution. There must be a simple answer, but I can't figure it out.

So far I've tried using rollmean and rollapply from zoo. I've tried several dpylr solutions and roll_mean from RcppRoll. The closest I've come to a fix is with zoo::rollapply. I think the trick lies in defining the "width" and "align" parameters. I need the window of measurements to be 672 (48 data points per day, over 14 days) with 336 on each side of the missing data point, but I only want to use the 48th measurement in that series of 672 AND I want to ignore NAs. Also, I'd like to know how to set a minimum requirement of available data points for the gap filling. Data points at the beginning and end of the time series should be NA.

I'm not going to provide a reproducible example from my massive dataset, but here is something to play with on a smaller scale:

data <- data.frame(hour =
c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23),
values = c(1,3,1,3,1,3,1,3,1,NA,1,3,1,3,NA,3,1,3,1,3,1,3,1,3))

Let's try to calculate a 6-hour rolling mean to fill the two gaps in this data set, using only data from even hours to fill even hours, and data from odd hours to fill odd hours. Therefore the gap at 10:00 should be 3 and the gap at 15:00 should be 1.

I tried this, it didn't work:

data$gap_fill <- rollapply(data$values, width = 6, by = 2, FUN = function(x) mean(x, 
na.rm=TRUE), by.column=TRUE, partial=TRUE, fill=NA, align="center")

Thanks in advance for your help!

Upvotes: 0

Views: 1344

Answers (1)

Nathan Werth
Nathan Werth

Reputation: 5273

rollapply is the right tool for the job, but your call needs some tweaking.

library(zoo)

data$gap_fill <- rollapply(
  data    = data$values,
  width   = 5,
  FUN     = function(x) {
    same_oddity <- seq(1, length(x), by = 2)
    mean(x[same_oddity], na.rm = TRUE)
  },
  partial = TRUE,
  fill    = NA,
  align   = "center"
)

data$imputed <- ifelse(
  is.na(data$values),
  data$gap_fill,
  data$values
)

data
#    hour values gap_fill imputed
# 1     0      1        1       1
# 2     1      3        1       3
# 3     2      1        1       1
# 4     3      3        3       3
# 5     4      1        1       1
# 6     5      3        3       3
# 7     6      1        1       1
# 8     7      3        3       3
# 9     8      1        1       1
# 10    9     NA        3       3
# 11   10      1        1       1
# 12   11      3        3       3
# 13   12      1        1       1
# 14   13      3        3       3
# 15   14     NA        1       1
# 16   15      3        3       3
# 17   16      1        1       1
# 18   17      3        3       3
# 19   18      1        1       1
# 20   19      3        3       3
# 21   20      1        1       1
# 22   21      3        3       3
# 23   22      1        1       1
# 24   23      3        3       3

What I changed:

  • The by parameter means windows will only be looked at around the byth elements of data. It doesn't affect how FUN works. You need to subset the windows within FUN.
  • by.column is useful when data is matrix-shaped. It says to do the process for each column and return a matrix.

Upvotes: 1

Related Questions