Reputation: 1821
I have a dataset where I have observations for three years (e.g., 2000, 2005, and 2010) and need to interpolate the values for the years in-between using R. I have attempted to use some type of spline to do this, however, the interpolated values are outside of the original range. In the case below they even become negative.
years <- c(2000, 2005, 2010)
outcome_values <- c(1, 10, 90)
plot(spline(years, outcome_values, xout = seq(min(years), max(years))))
points(years, outcome_values, pch = 16)
Someone described this situation and a solution in Python using a lower order spline (Smooth curved line between 3 points in plot and interpolate curve between three values), but I have not been able to figure out how to do this in R. Any pointers would be appreciated.
Upvotes: 0
Views: 1071
Reputation: 146224
Here's how to do it with a log transform on the outcome. This will guaranteed interpolated values are positive, and change the shape of the curve in a way you might like.
years = c(2000, 2005, 2010)
outcome_values= c(1, 10, 90)
sp = spline(years, log(outcome_values), xout = seq(min(years), max(years), length.out = 10))
plot(sp$x, exp(sp$y))
points(years, outcome_values, pch = 16)
Upvotes: 1
Reputation: 4233
You can lower the degree of the spline, but this won't solve your problem. It is the nature of your data that causes negative estimates:
library(splines)
years <- c(2000, 2005, 2010)
outcome_values <- c(1, 10, 90)
# quadratic B-basis spline
fit2 <- lm(outcome_values ~ bs(years, degree = 2))
plot(years, outcome_values, pch = 16)
lines(2000:2010, predict(fit2, data.frame(years = 2000:2010)), col = "blue")
That a spline results in negative predictions does not mean anything is wrong with this spline. You should use linear interpolation.
Upvotes: 0