stefan485
stefan485

Reputation: 39

knn train and class have different lengths

NN model to predict new data, but the error says: "'train' and 'class' have different lengths"

can someone replicate and solve this error?

weather <- c(1, 1, 1, 0, 0, 0)
temperature <- c(1, 0, 0, 1, 0, 0)
golf <- c(1, 0, 1, 0, 1, 0)
df <- data.frame(weather, temperature, golf)
df_new <- data.frame(weather = c(1,1,1,1,1,1,1,1,1), temp = c(0,0,0,0,0,0,0,0,0), sunnday= c(1,1,1,0,1,1,1,0,0))
pred_knn <- knn(train=df[, c(1,2)], test=df_new, cl=df$golf, k=1)

Thank you very much!

Upvotes: 0

Views: 1151

Answers (2)

Sarah
Sarah

Reputation: 1

I had a similar issue with data from the ISLR Weekly dataframe:

knn.pred = knn(train$Lag2,test$Lag2,train$Direction,k=1)
Error in knn(train$Lag2, test$Lag2, train$Direction, k = 1) :
dims of 'test' and 'train' differ

where

train = subset(Weekly, Year >= 1990 & Year <= 2008)
test = subset(Weekly, Year < 1990 | Year > 2008)

I finally solved it by putting the test and the train in as.matrix() like this:

knn.pred = knn(as.matrix(train$Lag2),as.matrix(test$Lag2),train$Direction,k=1)

Upvotes: 0

Nick Kharas
Nick Kharas

Reputation: 76

The knn function in R requires the training data to only contain independent variables, as the dependent variable is called separately in the "cl" parameter. Changing the line below will fix this particular error.

pred_knn <- knn(train=df[,c(1,2)], test=df_new, cl=df$golf, k=1)

However, note that running the above line will throw another error. Since knn calculates the Euclidean distance between observations, it requires all independent variables to be numeric. These pages have helpful related information. I would recommend using a different classifier for this particular data set.

https://towardsdatascience.com/k-nearest-neighbors-algorithm-with-examples-in-r-simply-explained-knn-1f2c88da405c https://discuss.analyticsvidhya.com/t/how-to-resolve-error-na-nan-inf-in-foreign-function-call-arg-6-in-knn/7280/4

Hope this helps.

Upvotes: 1

Related Questions