Reputation: 3
I have simulated bivariate data (x,y) where y has mean 1/x and some variance. The data looks something like this: Data
I am using kernel smoothing regression to try and find this relationship.
kernelreg = ksmooth( train_points$x, train_points$y, kernel = "normal", bandwidth = h) plot(y~x, train_points , cex = 0.5, col = "dodgerblue", main = "Data set") lines(kernelreg,lwd = 2, col = 2)
I am wondering how I can write a function to run this regression through a list of bandwidths and compute the rmse in training and test data. Thus showing the optimum bandwidth which minimizes the error of the model.
Upvotes: 0
Views: 1005
Reputation: 2584
You can put your model into a function and iterate over the bandwith
argument with lapply
. Then you can simply calculate RMSE for each run and take the min
.
library(caret)#for RMSE() function
set.seed(5)
x <- runif(1000)
y <- 20*(1/exp(x*20))+runif(1000,1,5)
plot(x,y)
df <- data.frame(x,y)
ind <- sample(1:nrow(df),nrow(df))
train_points <- df[ind,]
test_points <- df[-ind,]
mykern <- function(x, y, bw) {
kernelreg <- lapply(bw, function(bw)
ksmooth(x,
y,
kernel = "normal",
bandwidth = bw))
names(kernelreg)<- bw
rmse <- lapply(kernelreg, function(x)RMSE(x[["y"]],y))
names(rmse) <- bw
best.bw <- names(rmse[rmse==min(unlist(rmse))])
best.kern <- kernelreg[[which(names(kernelreg)==best.bw)]]
ll <- list(best.model=best.kern,best.bandwith=best.bw)
return(ll)
}
kernelreg <- mykern(train_points$x,
train_points$y,
bw = seq(0.1,1,0.1))
However take a look at the KernSmooth
package, as suggested by the documentation of ksmooth
:
This function was implemented for compatibility with S, although it is nowhere near as slow as the S function. Better kernel smoothers are available in other packages such as KernSmooth.
Upvotes: 0