Jon
Jon

Reputation: 345

KNN in R: 'train and class have different lengths'?

Here is my code:

train_points <- read.table("kaggle_train_points.txt", sep="\t")
train_labels <- read.table("kaggle_train_labels.txt", sep="\t")
test_points <- read.table("kaggle_test_points.txt", sep="\t")

#uses package 'class'
library(class)
knn(train_points, test_points, train_labels, k = 5);

dim(train_points) is 42000 x 784
dim(train_labels) is 42000 x 1

I don't see the issue, but I'm getting the error :

Error in knn(train_points, test_points, train_labels, k = 5) :
'train' and 'class' have different lengths.

What's the problem?

Upvotes: 17

Views: 54588

Answers (8)

anuanand
anuanand

Reputation: 400

Followed the code as given in the book but will show error due to mismatch lengths (1 is df other is vector returned). I reached here but nothing worked exactly but ideas helped that vectors were needed for comparison.

This throws error

gmodels::CrossTable(x = wbcd_test_labels, # actuals
                 y = wbcd_test_pred,   # predicted
                 prop.chisq = FALSE)

The following works :

gmodels::CrossTable(x = wbcd_test_labels$diagnosis, # actuals
                y = wbcd_test_pred,   # predicted
                prop.chisq = FALSE)

where using $ for x makes it a vector and hence matches

Additionally while running knn

Cl parameter shoud also have vector save labels in vectors else there will be length mismatch OR use labelDF$Class_label

wbcd_test_pred <- knn(train = wbcd_train, 
test = wbcd_test,
cl =wbcd_train_labels$diagnosis, #note this
k = 21)

Hope this helps beginners like me.

Upvotes: 0

Dileep Desai
Dileep Desai

Reputation: 1

Uninstall R Previous versions and install R version > 4.0. It will work.

Upvotes: -1

Russell
Russell

Reputation: 1

I had a similar error when I was reading to a tibble (read_csv) and when I switched to read.csv the code worked.

Upvotes: 0

Axel Ullern
Axel Ullern

Reputation: 71

I had the same issue in trying to apply knn on breast cancer diagnosis from wisconsin dataset I found that the issue was linked to the fact that cl argument need to be a vector factor (my mistake was to write cl=labels , I thought this was the vector to be predicted it was in fact a data frame of one column ) so the solution was to use the following syntax : knn (train, test,cl=labels$diagnosis,k=21) diagnosis was the header of the one column data frame labels and it worked well Hope this help !

Upvotes: 5

T.j. Gray
T.j. Gray

Reputation: 31

Try converting the data into a dataframe using as.dataframe(). I was having the same problem & afterwards it worked fine:

train_pointsdf <- as.data.frame(train_points)
train_labelsdf <- as.data.frame(train_labels)
test_pointsdf <- as.data.frame(test_points)

Upvotes: 3

Mark K.
Mark K.

Reputation: 2099

Simply set drop = TRUE while you're excluding cl from dataframe, it causes to remove dimension from an array which have only one level:

cl = train_labels[,1, drop = TRUE]
knn(train_points, test_points, cl, k = 5)

Upvotes: 1

crocodile
crocodile

Reputation: 129

I have recently encountered a very similar issue. I wanted to give only a single column as a predictor. In such cases, selecting a column, you have to remember about drop argument and set it to FALSE. The knn() function accepts only matrices or data frames as train and test arguments. Not vectors.

knn(train = trainSet[, 2, drop = FALSE], test = testSet[, 2, drop = FALSE], cl = trainSet$Direction, k = 5)

Upvotes: 3

csgillespie
csgillespie

Reputation: 60462

Without access to the data, it's really hard to help. However, I suspect that train_labels should be a vector. So try

cl = train_labels[,1]
knn(train_points, test_points, cl, k = 5)

Also double check:

dim(train_points)
dim(test_points)
length(cl)

Upvotes: 21

Related Questions