user3291389
user3291389

Reputation: 23

Error : Sets of levels in train and test don't match (knncat R)

I am trying to do knn classification using knncat in R since I have categorical attributes in my data set.

knncat(FinalData, FinalTestData, k=10, classcol = 15)

when i execute the above statement, it gives me the error that : Sets of levels in train and test do not match.

On checking of levels for all of the attributes, i did get a difference. I have a country attribute which can take from 1-41 values in train data set.

However in test data set, one particular country never appears and thus it is causing this error. How am I supposed to deal with that ?

Upvotes: 1

Views: 825

Answers (2)

Dylan K
Dylan K

Reputation: 1

Perhaps I am wrong, but wouldn't this still be problematic because the KNN algorithm bases its tuning off of Euclidian distance calculations, right? Wouldn't you still need to create a binary variable for each level of your categorical features, which would mean that you would have an issue given that certain levels might not appear in both the training and test sets.

Could someone perhaps enlighten me with regards to this.

Also, as a note, this is meant to be more of a spur than a hijack.

Upvotes: 0

Jaehyeon Kim
Jaehyeon Kim

Reputation: 1417

I'm not sure but you may match the factor levels as below.

train <- factor(c("a","b","c"))
test <- factor(c("a","b"))
levels(test) <- levels(train)
test   
[1] a b
Levels: a b c

Upvotes: 2

Related Questions