Jose C
Jose C

Reputation: 13

Error 'train' and 'class' have different lengths

I have been trying to use the KNN function to start my predictions, however when i run the code it throws the error:

Error in knn(data.frame(tr5_train), data.frame(tr5_test), cl = pred_train_labels, : 'train' and 'class' have different lengths

I have checked that all data sets are data.frame and tried to use the label as a vector with no success

The following is the code I've used:

test_tr5_no_target<- test_tr5[-2]


tr5_train<- test_tr5_no_target[1:74475, , drop = FALSE]

tr5_test<- test_tr5_no_target[74476:93094, , drop = FALSE]

pred_train_labels<- test_tr5[1:74475, 2] 

pred_test_labels<- test_tr5[74476:93094, 2]


#install.packages("class")

library(class)

##ensure all data is a dataframe

as.data.frame(tr5_train)

as.data.frame(tr5_test)

as.data.frame(pred_train_labels)


pred1<- knn(data.frame(tr5_train), data.frame(tr5_test), cl = pred_train_labels, k = 5)

Keep in mind for the labels column 2 is the numeric Target feature. I have researched all over and have not been able to find what is throwing this error, is there anything that i may be doing incorrectly?

Thanks for all the help, really appreciate it! (Unfortunately i can't share the data itself as it is restricted)

-Jose C.

Upvotes: 0

Views: 1557

Answers (1)

Peter_Evan
Peter_Evan

Reputation: 947

To answer you question directly: you want your label (here pred_train_labels) as a vector and NOT a data frame. We can recreate your error using the mtcars data set.

library('tidyverse')
library('class')
set.seed(1)

x <- mtcars
target <- x[-1]

size <- floor(0.75 * nrow(x))

train_ind <- sample(seq_len(nrow(x)), size = size)

train <- x[train_ind, ]
test <- x[-train_ind, ]

label <- as.data.frame(x[1][train_ind, ]) #problem is here

 test <- knn(train,test,cl = label, k = 5)

 test

 Error in knn(train, test, cl = label, k = 5) : 
     'train' and 'class' have different lengths

By allowing the label to be a vector and then calling the attributes from the new knn object, we can get an output:

train_ind <- sample(seq_len(nrow(x)), size = size)

train <- x[train_ind, ]
test <- x[-train_ind, ]

label <- x[1][train_ind, ] #NOT a dataframe

test <- knn(train,test,cl = label, k = 5, prob = TRUE)
attributes(test)

$`levels`
 [1] "10.4" "14.3" "14.7" "15"   "15.2" "15.8" "16.4" "17.3"
 [9] "17.8" "18.7" "19.2" "19.7" "21"   "21.4" "22.8" "24.4"
 [17] "26"   "30.4" "32.4"

Exploring the example in the ??knn shows this, too.

Upvotes: 1

Related Questions