Toby
Toby

Reputation: 543

How to compute prediction intervals for a circle fit in R

I wish to compute the prediction interval of the radius from a circle fit with the formula > r² = (x-h)²+(y-k)². r- radius of the circle, x,y, are gaussian coordinates, h,k, mark the center of the fitted circle.

# data
x <- c(1,2.2,1,2.5,1.5,0.5,1.7)
y <- c(1,1,3,2.5,4,1.7,0.8)
# using nls.lm from minpack.lm (minimising the sum of squared residuals)
library(minpack.lm)

residFun <- function(par,x,y) {
  res <- sqrt((x-par$h)^2+(y-par$k)^2)-par$r
  return(res)
}
parStart <- list("h" = 1.5, "k" = 2.5, "r" = 1.7)
out <- nls.lm(par = parStart, x = x, y = y, lower =NULL, upper = NULL, residFun)

The problem is, predict() doesn't work with nls.lm, hence I am trying to compute the circle fit using nlsLM. (I could compute it by hand, but have troubles creating my Designmatrix).`

So this is what I tried next:

dat = list("x" = x,"y" = y)
out1 <- nlsLM(y ~ sqrt(-(x-h)^2+r^2)+k, start = parStart )

which results in:

Error in stats:::nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates

Question 1a: How does nlsLM() work with circle fits? (advantage being that the generic predict() is available. Question 1b: How do I get the prediction interval for my circle fit?

EXAMPLE from linear regression (this is what I want for the circle regression)

attach(faithful)     
eruption.lm = lm(eruptions ~ waiting) 
newdata = data.frame(waiting=seq(45,90, length = 272)) 
# confidence interval
conf <- predict(eruption.lm, newdata, interval="confidence") 
# prediction interval
pred <- predict(eruption.lm, newdata, interval="predict")
# plot of the data [1], the regression line [1], confidence interval [2], and prediction interval [3]
plot(eruptions ~ waiting)
lines(conf[,1] ~ newdata$waiting, col = "black") # [1]
lines(conf[,2] ~ newdata$waiting, col = "red") # [2]
lines(conf[,3] ~ newdata$waiting, col = "red") # [2]
lines(pred[,2] ~ newdata$waiting, col = "blue") # [3]
lines(pred[,3] ~ newdata$waiting, col = "blue") # [3]

Kind regards

Summary of Edits:

Edit1: Rearranged formula in nlsLM, but parameter (h,k,r) results are now different in out and out1 ...

Edit2: Added 2 wikipedia links for clarification puprose on terminology used: (c.f. below)

confidence interval

prediction interval

Edit3: Some rephrasing of the question(s)

Edit4: Added a working example for linear regression

Upvotes: 2

Views: 944

Answers (3)

ben
ben

Reputation: 467

I think that this question is not answerable in its current form. Any predict() function that is based on a linear model will require the predicted variable to be a linear function of the input design matrix. r^2 = (x-x0)^2 + (y-y0)^2 is not a linear function of the design matrix (which would be something like [x0 x y0 y], so I don't think you're going to be able to find a linear model fit that will give you confidence intervals. If someone more clever than I am has a way to do it, though, I'd be very interested in hearing about it.

The general way to approach these sorts of problems is to create a hierarchical nonlinear model, where your hyperparameters would be x0 and y0 (your h and k) with uniform distribution over your search space, and then the r^2 would be distributed ~N((x-x0)^2+(y-y0)^2, \sigma). You would then use MCMC sampling or similar to get your posterior confidence intervals.

Upvotes: 1

IRTFM
IRTFM

Reputation: 263301

I am having a hard time figuring out what you want to do. Let me illustrate what the data looks like and something about the "prediction".

plot(x,y, xlim=range(x)*c(0, 1.5), ylim=range(y)*c(0, 1.5))
lines(out$par$h+c(-1,-1,1,1,-1)*out$par$r, # extremes of x-coord
      out$par$k+c(-1,1,1,-1 ,-1)*out$par$r, # extremes of y-coord
      col="red")

So what "prediction interval" are we speaking about? ( I do realize that you were thinking of a circle and if you just want to plot a circle on this background that's going to be pretty easy as well.)

lines(out$par$h+cos(seq(-pi,pi, by=0.1))*out$par$r, #center + r*cos(theta)
      out$par$k+sin(seq(-pi,pi, by=0.1))*out$par$r, #center + r*sin(theta)
      col="red")

enter image description here

Upvotes: 2

wespiserA
wespiserA

Reputation: 3168

Here's a solution to find h,k,r using base R's optim function. You essentially create a cost function that is a closure containing the data you wish to optimize over. I had to RSS value, else we would go to -Inf. There is a local optima problem, so you need to run this a few times...

# data
x <- c(1,2.2,1,2.5,1.5,0.5,1.7)
y <- c(1,1,3,2.5,4,1.7,0.8)

residFunArg <- function(xVector,yVector){

  function(theta,xVec=xVector,yVec=yVector){
  #print(xVec);print(h);print(r);print(k)
    sum(sqrt((xVec-theta[1])^2+(yVec-theta[2])^2)-theta[3])^2
  }
}

rFun = residFunArg(x,y);

o = optim(f=rFun,par=c(0,0,0))


h = o$par[1]
k = o$par[2]
r = o$par[3]

Run this command in the REPL to observe the local mins:

o=optim(f=tFun,par=runif(3),method="CG");o$par

Upvotes: 0

Related Questions