Reputation: 39
NN model to predict new data, but the error says: "'train' and 'class' have different lengths"
can someone replicate and solve this error?
weather <- c(1, 1, 1, 0, 0, 0)
temperature <- c(1, 0, 0, 1, 0, 0)
golf <- c(1, 0, 1, 0, 1, 0)
df <- data.frame(weather, temperature, golf)
df_new <- data.frame(weather = c(1,1,1,1,1,1,1,1,1), temp = c(0,0,0,0,0,0,0,0,0), sunnday= c(1,1,1,0,1,1,1,0,0))
pred_knn <- knn(train=df[, c(1,2)], test=df_new, cl=df$golf, k=1)
Thank you very much!
Upvotes: 0
Views: 1151
Reputation: 1
I had a similar issue with data from the ISLR Weekly dataframe:
knn.pred = knn(train$Lag2,test$Lag2,train$Direction,k=1)
Error in knn(train$Lag2, test$Lag2, train$Direction, k = 1) :
dims of 'test' and 'train' differ
where
train = subset(Weekly, Year >= 1990 & Year <= 2008)
test = subset(Weekly, Year < 1990 | Year > 2008)
I finally solved it by putting the test and the train in as.matrix() like this:
knn.pred = knn(as.matrix(train$Lag2),as.matrix(test$Lag2),train$Direction,k=1)
Upvotes: 0
Reputation: 76
The knn function in R requires the training data to only contain independent variables, as the dependent variable is called separately in the "cl" parameter. Changing the line below will fix this particular error.
pred_knn <- knn(train=df[,c(1,2)], test=df_new, cl=df$golf, k=1)
However, note that running the above line will throw another error. Since knn calculates the Euclidean distance between observations, it requires all independent variables to be numeric. These pages have helpful related information. I would recommend using a different classifier for this particular data set.
https://towardsdatascience.com/k-nearest-neighbors-algorithm-with-examples-in-r-simply-explained-knn-1f2c88da405c https://discuss.analyticsvidhya.com/t/how-to-resolve-error-na-nan-inf-in-foreign-function-call-arg-6-in-knn/7280/4
Hope this helps.
Upvotes: 1