Joseph Kim
Joseph Kim

Reputation: 79

'train' and 'class' have different lengths error in R

I just wanted to conduct a kNN classification with the situation when k is 3. I would like to predict the dependent variable “diabetes” in valid set using train set and calculate the accuracy.

But I faced to the error message with

Error in knn(train = TrainXNormDF, test = ValidXNormDF, cl = MLdata2[, : 'train' and 'class' have different lengths

I can't solve this problem with get approach with

for(i in ((length(MLValidY) + 1):length(TrainXNormDF)))+(MLValidY = c(MLValidY, 0))

What can I do for it? Please help.

My code is as like below

install.packages("mlbench")
install.packages("gbm")

library(mlbench)
library(gbm)

data("PimaIndiansDiabetes2")
head(PimaIndiansDiabetes2)

MLdata <- as.data.frame(PimaIndiansDiabetes2)
head(MLdata)
str(MLdata)
View(MLdata)

any(is.na(MLdata))
sum(is.na(MLdata))

MLdata2 <- na.omit(MLdata)
any(is.na(MLdata2))
sum(is.na(MLdata2))
View(MLdata2)

MLIdx <- sample(1:3, size = nrow(MLdata2), prob = c(0.6, 0.2, 0.2), replace = TRUE)

MLTrain <- MLdata2[MLIdx == 1,]
MLValid <- MLdata2[MLIdx == 2,]
MLTest <- MLdata2[MLIdx == 3,]

head(MLTrain)
head(MLValid)
head(MLTest)

str(MLTrain)
str(MLValid)
str(MLTest)

View(MLTestY)


MLTrainX <- MLTrain[ , -9]
MLValidX <- MLValid[ , -9]
MLTestX <- MLTest[ , -9]

MLTrainY <- as.data.frame(MLTrain[ , 9])
MLValidY <- as.data.frame(MLValid[ , 9])
MLTestY <- as.data.frame(MLTest[ , 9])

View(MLTrainX)
View(MLTrainY)

library(caret)

NormValues <- preProcess(MLTrainX, method = c("center", "scale"))

TrainXNormDF <- predict(NormValues, MLTrainX)
ValidXNormDF <- predict(NormValues, MLValidX)
TestXNormDF <- predict(NormValues, MLTestX)

head(TrainXNormDF)
head(ValidXNormDF)
head(TestXNormDF)


install.packages('FNN')
library(FNN)
library(class)

NN <- knn(train = TrainXNormDF, 
      test = ValidXNormDF,
      cl = MLValidY,
      k = 3)

Thank you

Upvotes: 1

Views: 317

Answers (2)

Kat
Kat

Reputation: 18714

As @rw2 stated, it's the length of cl. I think you meant to use MLtrainY, not MLvalidY. When you have a single column data frame, you can still run into shape problems (converts it to a vector). You could walk back to make sure that you use the right content here, like so:

NN <- knn(train = TrainXNormDF, 
          test = ValidXNormDF,
          cl = MLdata2[MLIdx == 1,]$diabetes, # shape no longer an issue
          k = 3)

Upvotes: 0

rw2
rw2

Reputation: 1793

Your cl variable is not the same length as your train variable. MLValidY only has 74 observations, while TrainXNormDF has 224.

cl should provide the true classification for every row in your training set.

Furthermore, cl is a data.frame instead of a vector.

Try the following:

NN <- knn(train = TrainXNormDF, 
      test = ValidXNormDF,
      cl = MLTrainY$`MLTrain[, 9]`,
      k = 3)

Upvotes: 2

Related Questions