K fold cross validation in R

Question

As I know, the k fold cross validation is to partition the training dataset into k equal subsets and each subset is different. The R code for k-fold validation which is from R-bloggers is attached below. This data have 506 obs. and 14 variables. According to the code, they used 10 folds. My question is that if each fold has the different subset or has some repeated data points in each fold. I wanna make sure to test each data points without repeating, so my goal is to get each fold has different data points.

set.seed(450)
cv.error <- NULL
k <- 10

library(plyr) 
pbar <- create_progress_bar('text')
pbar$init(k)

for(i in 1:k){
index <- sample(1:nrow(data),round(0.9*nrow(data)))
train.cv <- scaled[index,]
test.cv <- scaled[-index,]

nn <- neuralnet(f,data=train.cv,hidden=c(5,2),linear.output=T)

pr.nn <- compute(nn,test.cv[,1:13])
pr.nn <- pr.nn$net.result*(max(data$medv)-min(data$medv))+min(data$medv)

test.cv.r <- (test.cv$medv)*(max(data$medv)-min(data$medv))+min(data$medv)

cv.error[i] <- sum((test.cv.r - pr.nn)^2)/nrow(test.cv)

pbar$step()
}

BJK · Accepted Answer

you can shuffle the whole population from outside of the loop. the following code might give you an idea to solve the problem.

set.seed(450)
cv.error <- NULL
k <- 10

library(plyr) 
pbar <- create_progress_bar('text')
pbar$init(k)

total_index<-sample(1:nrows(data),nrows(data)) 
    ## shuffle the whole index of samples

for(i in 1:k){
index<-total_index[(i*(k-1)+1):(i*(k-1)+k)] 
    ## pick the k samples from (i*(k-1)+1) to (i*(k-1)+k).
    ## so you can avoid of picking overlapping data point in other validation set
train.cv <- scaled[-index,] ## pick the samples not in the index(-validation)
test.cv <- scaled[index,]  ## pick the k samples for validation.

nn <- neuralnet(f,data=train.cv,hidden=c(5,2),linear.output=T)

pr.nn <- compute(nn,test.cv[,1:13])
pr.nn <- pr.nn$net.result*(max(data$medv)-min(data$medv))+min(data$medv)

test.cv.r <- (test.cv$medv)*(max(data$medv)-min(data$medv))+min(data$medv)

cv.error[i] <- sum((test.cv.r - pr.nn)^2)/nrow(test.cv)

pbar$step()
}

K fold cross validation in R

Answers (2)

Related Questions