vcai01
vcai01

Reputation: 53

How to predict in kknn function? library(kknn)

I try to use kknn + loop to create a leave-out-one cross validation for a model, and compare that with train.kknn.

I have split the data into two parts: training (80% data), and test (20% data). In the training data, I exclude one point in the loop to manually create LOOCV.

I think something gets wrong in predict(knn.fit, data.test). I have tried to find how to predict in kknn through the kknn package instruction and online but all the examples are "summary(model)" and "table(validation...)" rather than the prediction on a separate test data. The code predict(model, dataset) works successfully in train.kknn function, so I thought I could use the similar arguments in kknn.

I am not sure if there is such a prediction function in kknn. If yes, what arguments should I give?

Look forward to your suggestion. Thank you.

library(kknn)
for (i in 1:nrow(data.train)) {
    train.data <- data.train[-i,]
    validation.data <- data.train[i,]
    knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
                    kernel = "rectangular", scale = TRUE)
    # train.data + validation.data is the 80% data I split.
}

pred.knn <- predict(knn.fit, data.test) # data.test is 20% data.

Here is the error message:

Error in switch(type, raw = object$fit, prob = object$prob, stop("invalid type for prediction")) : EXPR must be a length 1 vector

Actually I try to compare train.kknn and kknn+loop to compare the results of the leave-out-one CV. I have two more questions:

1) in kknn: is it possible to use another set of data as test data to see the knn.fit prediction?

2) in train.kknn: I split the data and use 80% of the whole data and intend to use the rest 20% for prediction. Is it an correct common practice?

2) Or should I just use the original data (the whole data set) for train.kknn, and create a loop: data[-i,] for training, data[i,] for validation in kknn? So they will be the counterparts?

I find that if I use the training data in the train.kknn function and use prediction on test data set, the best k and kernel are selected and directly used in generating the predicted value based on the test dataset.

In contrast, if I use kknn function and build a loop of different k values, the model generates the corresponding prediction results based on the test data set each time the k value is changed. Finally, in kknn + loop, the best k is selected based on the best actual prediction accuracy rate of test data. In short, the best k train.kknn selected may not work best on test data.

Thank you.

Upvotes: 4

Views: 20067

Answers (1)

Marco Sandri
Marco Sandri

Reputation: 24262

For objects returned by kknn, predict gives the predicted value or the predicted probabilities of R1 for the single row contained in validation.data:

predict(knn.fit)
predict(knn.fit, type="prob")

The predict command also works on objects returned by train.knn.
For example:

train.kknn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 10,
                      kernel = "rectangular", scale = TRUE)
class(train.kknn.fit)
# [1] "train.kknn" "kknn"

pred.train.kknn <- predict(train.kknn.fit, data.test)
table(pred.train.kknn, as.factor(data.test$R1))

The train.kknn command implements a leave-one-out method very close to the loop developed by @vcai01. See the following example:

set.seed(43210)
n <- 500
data.train <- data.frame(R1=rbinom(n,1,0.5), matrix(rnorm(n*10), ncol=10))

library(kknn)
pred.kknn <- array(0, nrow(data.train))
for (i in 1:nrow(data.train)) {
    train.data <- data.train[-i,]
    validation.data <- data.train[i,]
    knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
                    kernel = "rectangular", scale = TRUE)
    pred.kknn[i] <- predict(knn.fit)
}

knn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 40,
                      kernel = "rectangular", scale = TRUE)
pred.train.kknn <- predict(knn.fit, data.train)
table(pred.train.kknn, pred.kknn)

#                pred.kknn
# pred.train.kknn   1   2
#               0 374  14
#               1   9 103

Upvotes: 6

Related Questions