Reputation: 23
I am trying to do knn classification using knncat in R since I have categorical attributes in my data set.
knncat(FinalData, FinalTestData, k=10, classcol = 15)
when i execute the above statement, it gives me the error that : Sets of levels in train and test do not match.
On checking of levels for all of the attributes, i did get a difference. I have a country attribute which can take from 1-41 values in train data set.
However in test data set, one particular country never appears and thus it is causing this error. How am I supposed to deal with that ?
Upvotes: 1
Views: 825
Reputation: 1
Perhaps I am wrong, but wouldn't this still be problematic because the KNN algorithm
bases its tuning off of Euclidian distance calculations
, right?
Wouldn't you still need to create a binary variable for each level of your categorical features, which would mean that you would have an issue given that certain levels might not appear in both the training and test sets.
Could someone perhaps enlighten me with regards to this.
Also, as a note, this is meant to be more of a spur than a hijack.
Upvotes: 0
Reputation: 1417
I'm not sure but you may match the factor levels as below.
train <- factor(c("a","b","c"))
test <- factor(c("a","b"))
levels(test) <- levels(train)
test
[1] a b
Levels: a b c
Upvotes: 2