B_man
B_man

Reputation: 109

Why am I getting 'train' and 'class' have different lengths"

Why am I getting -

'train' and 'class' have different lengths

In spite of having both of them with same lengths

y_pred=knn(train=training_set[,1:2],
       test=Test_set[,-3],
       cl=training_set[,3],
       k=5)

Their lengths are given below-

 > dim(training_set[,-3])
[1] 300   2
> dim(training_set[,3])
[1] 300   1



 > head(training_set)
# A tibble: 6 x 3
     Age EstimatedSalary Purchased
   <dbl>           <dbl> <fct>    
1 -1.77           -1.47  0        
2 -1.10           -0.788 0        
3 -1.00           -0.360 0        
4 -1.00            0.382 0        
5 -0.523           2.27  1        
6 -0.236          -0.160 0   

> Test_set
# A tibble: 100 x 3
      Age EstimatedSalary Purchased
    <dbl>           <dbl> <fct>    
 1 -0.304          -1.51  0        
 2 -1.06           -0.325 0        
 3 -1.82            0.286 0        
 4 -1.25           -1.10  0        
 5 -1.15           -0.485 0        
 6  0.641          -1.32  1        
 7  0.735          -1.26  1        
 8  0.924          -1.22  1        
 9  0.829          -0.582 1        
10 -0.871          -0.774 0  

Upvotes: 0

Views: 61

Answers (1)

David_O
David_O

Reputation: 1153

It's because knn is expecting class to be a vector and you are giving it a data table with one column. The test knn is doing is whether nrow(train) == length(cl). If cl is a data table that does not give the answer you are expecting. Compare:

> length(data.frame(a=c(1,2,3)))
[1] 1
> length(c(1,2,3))
[1] 3

If you use cl=training_set$Purchased, which extracts the vector from the table, that should fix it.

This is specific gotcha if you are moving from data.frame to data.table because the default drop behaviour is different:

> dt <- data.table(a=1:3, b=4:6)
> dt[,2]
   b
1: 4
2: 5
3: 6
> df <- data.frame(a=1:3, b=4:6)
> df[,2]
[1] 4 5 6
> df[,2, drop=FALSE]
  b
1 4
2 5
3 6

Upvotes: 2

Related Questions