Reputation: 705
I have missing values in a time series of dates. For example:
set.seed(101)
df <- data.frame(DATE = as.Date(c('2012-01-01', '2012-01-02',
'2012-01-03', '2012-01-05', '2012-01-06', '2012-01-15', '2012-01-18',
'2012-01-19', '2012-01-20', '2012-01-22')),
VALUE = rnorm(10, mean = 5, sd = 2))
How can I write a function that will fill all the missing dates between the first and last date (ie 2012-01-01 and 2012-01-22'), then use interpolation (linear and smoothing spline) to fill the missing values, but not more than 3 sequential missing values (ie no interpolation between 2012-01-06 and 2012-01-15)?
The function will be applied to a very large dataframe. I have been able to write a function that uses linear interpolation to fill all missing values between two dates (see below), but I can not figure out how to stop it interpolating long stretches of missing values.
interpolate.V <- function(df){
# sort data by time
df <- df[order(df$DATE),]
# linnearly interpolate VALUE for all missing DATEs
temp <- with(df, data.frame(approx(DATE, VALUE, xout = seq(DATE[1],
DATE[nrow(df)], "day"))))
colnames(temp) <- c("DATE", "VALUE_INTERPOLATED")
temp$ST_ID <- df$ST_ID[1]
out <- merge(df, temp, all = T)
rm(temp)
return(out)
}
Any help will be greatly appreciated!
Thanks
Upvotes: 0
Views: 1792
Reputation: 705
Function that adds rows for all missing dates:
date.range <- function(sub){
sub$DATE <- as.Date(sub$DATE)
DATE <- seq.Date(min(sub$DATE), max(sub$DATE), by="day")
all.dates <- data.frame(DATE)
out <- merge(all.dates, sub, all = T)
return(out)
}
Use na.approx or na.spline from zoo package with maxgap argument:
interpolate.zoo <- function(df){
df$VALUE_INT <- na.approx(df$VALUE, maxgap = 3, na.rm = F)
return(df)
}
Upvotes: 2