Powege
Powege

Reputation: 705

How to interpolate missing values in a time series, limited by the number of sequential NAs (R)?

I have missing values in a time series of dates. For example:

set.seed(101)

df <- data.frame(DATE = as.Date(c('2012-01-01', '2012-01-02', 
'2012-01-03', '2012-01-05', '2012-01-06', '2012-01-15', '2012-01-18', 
'2012-01-19', '2012-01-20', '2012-01-22')),
                 VALUE = rnorm(10, mean = 5, sd = 2))

How can I write a function that will fill all the missing dates between the first and last date (ie 2012-01-01 and 2012-01-22'), then use interpolation (linear and smoothing spline) to fill the missing values, but not more than 3 sequential missing values (ie no interpolation between 2012-01-06 and 2012-01-15)?

The function will be applied to a very large dataframe. I have been able to write a function that uses linear interpolation to fill all missing values between two dates (see below), but I can not figure out how to stop it interpolating long stretches of missing values.

interpolate.V <- function(df){

  # sort data by time 
  df <- df[order(df$DATE),]

  # linnearly interpolate VALUE for all missing DATEs
  temp <- with(df, data.frame(approx(DATE, VALUE, xout = seq(DATE[1], 
               DATE[nrow(df)], "day"))))
  colnames(temp) <- c("DATE", "VALUE_INTERPOLATED")
  temp$ST_ID <- df$ST_ID[1]
  out <- merge(df, temp, all = T)
  rm(temp)

  return(out)
}

Any help will be greatly appreciated!

Thanks

Upvotes: 0

Views: 1792

Answers (1)

Powege
Powege

Reputation: 705

Function that adds rows for all missing dates:

date.range <- function(sub){

  sub$DATE <- as.Date(sub$DATE)
  DATE <- seq.Date(min(sub$DATE), max(sub$DATE), by="day")
  all.dates <- data.frame(DATE)
  out <- merge(all.dates, sub, all = T)

  return(out)
}

Use na.approx or na.spline from zoo package with maxgap argument:

interpolate.zoo <- function(df){
  df$VALUE_INT <- na.approx(df$VALUE, maxgap = 3, na.rm = F)
  return(df)
}

Upvotes: 2

Related Questions