user2203351
user2203351

Reputation: 45

How can I use leave-one-out cross-validation on loess function?

I'm getting NA as my result. What am I doing wrong?

data(Boston, package='MASS')

x <- Boston$dis
y <- Boston$nox
n <- length(x)
cvs <- rep(0, n)

for(i in 1:n){
 xi <- x[-i]
 yi <- y[-i]
 d <- loess(yi~xi, span=0.2, degree=2)
 cvs[i] <- (y[i] - predict(d, newdata=data.frame(xi=x[i])))^2
}

mean(cvs)

Upvotes: 2

Views: 1514

Answers (3)

Dean Eckles
Dean Eckles

Reputation: 173

You need to set loess.control(surface = "direct") in order to extrapolate.

You also may want to find a faster way to do this, since this requires a lot of fitting.

Upvotes: 0

IRTFM
IRTFM

Reputation: 263352

mean(cvs,na.rm=TRUE)
[1] 0.003753745

plot(y~x)
lines( d$x[order(d$x)], d$fitted[order(d$x)])
which(is.na(d$fitted[order(d$x)]) )
#integer(0)

You can see that the NA's come at the extremes of the x-range:

#add as debugging code
if(is.na(cvs[i]) ) {print(i);print(x[i])}}
[1] 354
[1] 12.1265
[1] 373
[1] 1.1296

 range(x)
#[1]  1.1296 12.1265

But I still don't understand why:

(y[ which(is.na(cvs)) ]-predict(d, x[ which(is.na(cvs)) ] ))^2
[1] 3.218139e-06 3.742504e-04

Upvotes: 2

krlmlr
krlmlr

Reputation: 25454

Aggregate functions such as mean do not allow NA values by default, but you have some in the result vector:

which(is.na(cvs))
## [1] 354 373

Not sure where they come from, but you could make the mean function accept NA values by passing na.rm=TRUE.

Upvotes: 1

Related Questions