Bryan
Bryan

Reputation: 1821

Interpolating values between three points in R

I have a dataset where I have observations for three years (e.g., 2000, 2005, and 2010) and need to interpolate the values for the years in-between using R. I have attempted to use some type of spline to do this, however, the interpolated values are outside of the original range. In the case below they even become negative.

years <- c(2000, 2005, 2010)
outcome_values <- c(1, 10, 90)
plot(spline(years, outcome_values, xout = seq(min(years), max(years))))
points(years, outcome_values, pch = 16)

plot output

Someone described this situation and a solution in Python using a lower order spline (Smooth curved line between 3 points in plot and interpolate curve between three values), but I have not been able to figure out how to do this in R. Any pointers would be appreciated.

Upvotes: 0

Views: 1071

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 146224

Here's how to do it with a log transform on the outcome. This will guaranteed interpolated values are positive, and change the shape of the curve in a way you might like.

years = c(2000, 2005, 2010)
outcome_values= c(1, 10, 90)

sp = spline(years, log(outcome_values), xout = seq(min(years), max(years), length.out = 10))
plot(sp$x, exp(sp$y))
points(years, outcome_values, pch = 16)

enter image description here

Upvotes: 1

slava-kohut
slava-kohut

Reputation: 4233

You can lower the degree of the spline, but this won't solve your problem. It is the nature of your data that causes negative estimates:

library(splines)

years <- c(2000, 2005, 2010)
outcome_values <- c(1, 10, 90)

# quadratic B-basis spline
fit2 <- lm(outcome_values ~ bs(years, degree = 2))

plot(years, outcome_values, pch = 16)
lines(2000:2010, predict(fit2, data.frame(years = 2000:2010)), col = "blue")

That a spline results in negative predictions does not mean anything is wrong with this spline. You should use linear interpolation.

Upvotes: 0

Related Questions