Reputation: 13
I have been trying to use the KNN function to start my predictions, however when i run the code it throws the error:
Error in knn(data.frame(tr5_train), data.frame(tr5_test), cl = pred_train_labels, : 'train' and 'class' have different lengths
I have checked that all data sets are data.frame and tried to use the label as a vector with no success
The following is the code I've used:
test_tr5_no_target<- test_tr5[-2]
tr5_train<- test_tr5_no_target[1:74475, , drop = FALSE]
tr5_test<- test_tr5_no_target[74476:93094, , drop = FALSE]
pred_train_labels<- test_tr5[1:74475, 2]
pred_test_labels<- test_tr5[74476:93094, 2]
#install.packages("class")
library(class)
##ensure all data is a dataframe
as.data.frame(tr5_train)
as.data.frame(tr5_test)
as.data.frame(pred_train_labels)
pred1<- knn(data.frame(tr5_train), data.frame(tr5_test), cl = pred_train_labels, k = 5)
Keep in mind for the labels column 2 is the numeric Target feature. I have researched all over and have not been able to find what is throwing this error, is there anything that i may be doing incorrectly?
Thanks for all the help, really appreciate it! (Unfortunately i can't share the data itself as it is restricted)
-Jose C.
Upvotes: 0
Views: 1557
Reputation: 947
To answer you question directly: you want your label (here pred_train_labels
) as a vector and NOT a data frame. We can recreate your error using the mtcars
data set.
library('tidyverse')
library('class')
set.seed(1)
x <- mtcars
target <- x[-1]
size <- floor(0.75 * nrow(x))
train_ind <- sample(seq_len(nrow(x)), size = size)
train <- x[train_ind, ]
test <- x[-train_ind, ]
label <- as.data.frame(x[1][train_ind, ]) #problem is here
test <- knn(train,test,cl = label, k = 5)
test
Error in knn(train, test, cl = label, k = 5) :
'train' and 'class' have different lengths
By allowing the label to be a vector and then calling the attributes from the new knn object, we can get an output:
train_ind <- sample(seq_len(nrow(x)), size = size)
train <- x[train_ind, ]
test <- x[-train_ind, ]
label <- x[1][train_ind, ] #NOT a dataframe
test <- knn(train,test,cl = label, k = 5, prob = TRUE)
attributes(test)
$`levels`
[1] "10.4" "14.3" "14.7" "15" "15.2" "15.8" "16.4" "17.3"
[9] "17.8" "18.7" "19.2" "19.7" "21" "21.4" "22.8" "24.4"
[17] "26" "30.4" "32.4"
Exploring the example in the ??knn
shows this, too.
Upvotes: 1