Reputation: 41
Consider this regression problem with the following training set:
I want to predict the 2-nearest neighbour prediction for each object - however, I keep getting different predictions every time I call the knn function. Should this be the case? Here is the code I'm using:
library(class)
test <- train <- matrix(c(-1, 0, 2, 3),,1)
cl <- c(0, 1, 2, 1)
knn(train, test, cl, k=2)
Output:
> knn(train, test, cl, k=2)
[1] 1 1 2 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 0 0 1 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 1 1 1 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 0 0 1 2
Levels: 0 1 2
Would really appreciate any clarification.
Upvotes: 4
Views: 2290
Reputation: 13932
Inknn
ties are broken at random and the way you have it setup you will always have exactly one correct (exact match) and one incorrect label (the nearest match) in the vote and thus the result is always a random pick between the actual label and the wrong one.
You can see that empirically by running the experiment many times and looking at the results - each row will have exactly two different outcomes in roughly the same proportion.
Upvotes: 6
Reputation: 348
Despite the code not working, my guess is that there is a tie and in that case it randomly chooses, which is why you're seeing seeing different results each time you use it. Choosing k=3 in this case would stop all ties and give you the same answer every time.
Upvotes: 0