Reputation: 41

Constantly getting different predictions for a small data set when using KNN (k = 2) in R

Consider this regression problem with the following training set:

enter image description here

I want to predict the 2-nearest neighbour prediction for each object - however, I keep getting different predictions every time I call the knn function. Should this be the case? Here is the code I'm using:

library(class)
test <- train <- matrix(c(-1, 0, 2, 3),,1)
cl <- c(0, 1, 2, 1)
knn(train, test, cl, k=2)

Output:

> knn(train, test, cl, k=2)
[1] 1 1 2 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 0 0 1 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 1 1 1 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 0 0 1 2
Levels: 0 1 2

Would really appreciate any clarification.

Upvotes: 4

Answers (2)

Simon Urbanek

Reputation: 13932

Inknn ties are broken at random and the way you have it setup you will always have exactly one correct (exact match) and one incorrect label (the nearest match) in the vote and thus the result is always a random pick between the actual label and the wrong one.

You can see that empirically by running the experiment many times and looking at the results - each row will have exactly two different outcomes in roughly the same proportion.

Upvotes: 6

TBSRounder

Reputation: 348

Despite the code not working, my guess is that there is a tie and in that case it randomly chooses, which is why you're seeing seeing different results each time you use it. Choosing k=3 in this case would stop all ties and give you the same answer every time.

Upvotes: 0

Constantly getting different predictions for a small data set when using KNN (k = 2) in R

Answers (2)

Related Questions