wernor
wernor

Reputation: 421

Interpolate with splines without surpassing next value R

I have a dataset of accumulated data. I am trying to interpolate some missing values but at some points I get a superior value. This is an example of my data:

dat <- tibble(day=c(1:30),
              value=c(278, 278, 278, NA, NA, 302, 316, NA, 335, 359, NA, NA,
                      383, 403, 419, 419, 444, NA, NA, 444, 464, 487, 487, 487, 
                       NA, NA, 487, 487, 487, 487))

My dataset is quite long and when I use smooth.spline to interpolate the missing values I get a value greater than the next observation, which is quite aabsurd considering I am dealing with accumulated data. This is the output I get:

value.smspl <- c(278, 278, 278, 287.7574, 295.2348, 302, 316, 326.5689, 335, 
359, 364.7916, 377.3012, 383, 403, 419, 419, 444, 439.765, 447.1823, 
444, 464, 487, 487, 487, 521.6235, 526.3715, 487, 487, 487, 487)

smooth.spline

My question is: can you somehow set boundaries for the interpolation so the result is reliable? If so, how could you do it?

Upvotes: 1

Views: 426

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73265

You have monotonic data for interpolation. We can use "hyman" method in spline():

x <- dat$day
yi <- y <- dat$value
naInd <- is.na(y)
yi[naInd] <- spline(x[!naInd], y[!naInd], xout = x[naInd], method = "hyman")$y

plot(x, y, pch = 19)  ## non-NA data (black)
points(x[naInd], yi[naInd], pch = 19, col = 2)  ## interpolation at NA (red)

spline


Package zoo has a number of functions to fill NA values, one of which is na.spline. So as G. Grothendieck (a wizard for time series) suggests, the following does the same:

library(zoo)
library(dplyr)
dat %>% mutate(value.interp = na.spline(value, method = "hyman"))

Upvotes: 5

Related Questions