Ashley A Holmes
Ashley A Holmes

Reputation: 69

Error in knn 'train' and 'class' have different lengths

I'm trying to use the knn function (from the class package) on my dataset. It has 5 columns of features, and the 6th is what I want to be able to predict. I'm doing a 70/30 split.

Here's my code:

> ind <- createDataPartition(CSD$Caesarian, p=0.70, list=FALSE)
> csd_train <- CSD[ ind,]
> csd_test <- CSD[-ind,]
> c1 <- CSD[1:6,-c(1,2,3,4,5)]
> knn(train, test, c1, k=2, prob=TRUE)

But I'm getting this error.

Error in knn(train, test, c1, k = 2, prob = TRUE) : 
  'train' and 'class' have different lengths

I looked at other threads and trying their suggested solutions (KNN in R: 'train and class have different lengths'?)

and tried the following, but I'm still getting errors

> c1 = as.factor(c1)
> dim(csd_train)
[1] 57  6
> dim(csd_test)
[1] 23  6
> length(c1)
[1] 6
> knn(train, test, c1, k=2, prob=TRUE)
Error in knn(train, test, c1, k = 2, prob = TRUE) : 
  'train' and 'class' have different lengths

I also tried this, and still getting an error.

> c1 = as.factor(CSD[['Caesarian']])
> knn(train, test, c1, k=2, prob=TRUE)
Error in knn(train, test, c1, k = 2, prob = TRUE) : 
  'train' and 'class' have different lengths

I'm lost as to how to fix this.

Here's a sample of my data if that helps:

> dput(head(CSD))
structure(list(Age = c(22L, 26L, 26L, 28L, 22L, 26L), Delivery.NO = c(1L, 
2L, 2L, 1L, 2L, 1L), Delivery.NO.1 = c(1L, 1L, 0L, 1L, 1L, 0L
), BP = c(2L, 1L, 1L, 2L, 1L, 0L), Heart.Problem = c(1L, 1L, 
1L, 1L, 1L, 1L), Caesarian = structure(c(1L, 2L, 1L, 1L, 2L, 
1L), .Label = c("N", "Y"), class = "factor")), .Names = c("Age", 
"Delivery.NO", "Delivery.NO.1", "BP", "Heart.Problem", "Caesarian"
), row.names = c(NA, 6L), class = "data.frame")

EDIT I did

c1 <- csd_train[, 6]

and the length(c1) is now 57, which is good. However, when I run the knn line, I'm now getting this new error:

Error in knn(csd_train, csd_test, c1, k = 2, prob = TRUE) : NA/NaN/Inf in `foreign function call (arg 6) In addition: Warning messages: 1: In` `knn(csd_train, csd_test, c1, k = 2, prob = TRUE) : NAs introduced by coercion 2:` `In knn(csd_train, csd_test, c1, k = 2, prob = TRUE) : NAs introduced by coercion`

All of my predictor variables are numeric, and there are no missing values.

Upvotes: 1

Views: 14720

Answers (1)

Alex
Alex

Reputation: 4995

I think I have an answer.

Here is a working example using the iris dataset. You have to leave out the target variable in your train and test set. Pass the target variable for your train set to the argument cl within the knn call. Then it should work. In this example the target variable is in column 5.

The error occurs when the length of cl is not equal to the number of rows in your test set.

library(class)
library(caret)

dat<-iris

ind <- createDataPartition(dat$Species, p=0.70, list=FALSE)
dat_train <- dat[ ind,-5]         #leave your target variable out 
dat_test <- dat[-ind,-5]          #leave your target variable out
cl<-dat[ind,5]                    #your target variable for the train set
knn(dat_train, dat_test, cl, k=2, prob=TRUE)

*edit

I found the error in your code. If your data look like this:

> dim(csd_train)
 [1] 57  6
> dim(csd_test)
 [1] 23  6
> length(c1)
 [1] 6

it cannot work since the length of c1 (6) does not match with the number of rows of csd_train (57).

**Another edit:

Try exactly this:

ind <- createDataPartition(CSD$Caesarian, p=0.70, list=FALSE)
csd_train <- CSD[ ind,-6]
csd_test <- CSD[-ind,-6]
c1 <- CSD[ ind,6]
knn(csd_train , csd_test, c1, k=2, prob=TRUE)

Upvotes: 1

Related Questions