Reputation: 53
I try to use kknn
+ loop to create a leave-out-one cross validation for a model, and compare that with train.kknn.
I have split the data into two parts: training (80% data), and test (20% data). In the training data, I exclude one point in the loop to manually create LOOCV.
I think something gets wrong in predict(knn.fit, data.test)
. I have tried to find how to predict in kknn
through the kknn package instruction and online but all the examples are "summary(model)" and "table(validation...)" rather than the prediction on a separate test data. The code predict(model, dataset)
works successfully in train.kknn
function, so I thought I could use the similar arguments in kknn.
I am not sure if there is such a prediction function in kknn
. If yes, what arguments should I give?
Look forward to your suggestion. Thank you.
library(kknn)
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
# train.data + validation.data is the 80% data I split.
}
pred.knn <- predict(knn.fit, data.test) # data.test is 20% data.
Here is the error message:
Error in switch(type, raw = object$fit, prob = object$prob, stop("invalid type for prediction")) : EXPR must be a length 1 vector
Actually I try to compare train.kknn and kknn+loop to compare the results of the leave-out-one CV. I have two more questions:
1) in kknn
: is it possible to use another set of data as test data to see the knn.fit prediction?
2) in train.kknn
: I split the data and use 80% of the whole data and intend to use the rest 20% for prediction. Is it an correct common practice?
2) Or should I just use the original data (the whole data set) for train.kknn
, and create a loop: data[-i,]
for training, data[i,]
for validation in kknn
? So they will be the counterparts?
I find that if I use the training data in the train.kknn
function and use prediction on test data set, the best k and kernel are selected and directly used in generating the predicted value based on the test dataset.
In contrast, if I use kknn
function and build a loop of different k values, the model generates the corresponding prediction results based on
the test data set each time the k value is changed. Finally, in kknn
+ loop, the best k is selected based on the best actual prediction accuracy rate of test data. In short, the best k train.kknn
selected may not work best on test data.
Thank you.
Upvotes: 4
Views: 20067
Reputation: 24262
For objects returned by kknn
, predict
gives the predicted value or the predicted probabilities of R1
for the single row contained in validation.data
:
predict(knn.fit)
predict(knn.fit, type="prob")
The predict
command also works on objects returned by train.knn
.
For example:
train.kknn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 10,
kernel = "rectangular", scale = TRUE)
class(train.kknn.fit)
# [1] "train.kknn" "kknn"
pred.train.kknn <- predict(train.kknn.fit, data.test)
table(pred.train.kknn, as.factor(data.test$R1))
The train.kknn
command implements a leave-one-out method very close to the loop developed by @vcai01. See the following example:
set.seed(43210)
n <- 500
data.train <- data.frame(R1=rbinom(n,1,0.5), matrix(rnorm(n*10), ncol=10))
library(kknn)
pred.kknn <- array(0, nrow(data.train))
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
pred.kknn[i] <- predict(knn.fit)
}
knn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 40,
kernel = "rectangular", scale = TRUE)
pred.train.kknn <- predict(knn.fit, data.train)
table(pred.train.kknn, pred.kknn)
# pred.kknn
# pred.train.kknn 1 2
# 0 374 14
# 1 9 103
Upvotes: 6