Reputation: 421
I have a dataset of accumulated data. I am trying to interpolate some missing values but at some points I get a superior value. This is an example of my data:
dat <- tibble(day=c(1:30),
value=c(278, 278, 278, NA, NA, 302, 316, NA, 335, 359, NA, NA,
383, 403, 419, 419, 444, NA, NA, 444, 464, 487, 487, 487,
NA, NA, 487, 487, 487, 487))
My dataset is quite long and when I use smooth.spline to interpolate the missing values I get a value greater than the next observation, which is quite aabsurd considering I am dealing with accumulated data. This is the output I get:
value.smspl <- c(278, 278, 278, 287.7574, 295.2348, 302, 316, 326.5689, 335,
359, 364.7916, 377.3012, 383, 403, 419, 419, 444, 439.765, 447.1823,
444, 464, 487, 487, 487, 521.6235, 526.3715, 487, 487, 487, 487)
My question is: can you somehow set boundaries for the interpolation so the result is reliable? If so, how could you do it?
Upvotes: 1
Views: 426
Reputation: 73265
You have monotonic data for interpolation. We can use "hyman" method in spline()
:
x <- dat$day
yi <- y <- dat$value
naInd <- is.na(y)
yi[naInd] <- spline(x[!naInd], y[!naInd], xout = x[naInd], method = "hyman")$y
plot(x, y, pch = 19) ## non-NA data (black)
points(x[naInd], yi[naInd], pch = 19, col = 2) ## interpolation at NA (red)
Package zoo has a number of functions to fill NA values, one of which is na.spline
. So as G. Grothendieck (a wizard for time series) suggests, the following does the same:
library(zoo)
library(dplyr)
dat %>% mutate(value.interp = na.spline(value, method = "hyman"))
Upvotes: 5