Reputation: 137
I'm using the dataset found here: http://archive.ics.uci.edu/ml/datasets/Qualitative_Bankruptcy
When running code: library(caret)
bank <- read.csv("Qualitative_Bankruptcy.data.txt", header=FALSE, na.strings = "?",
strip.white = TRUE)
x=bank[1:6]
y=bank[7]
bank.knn <- train(x, y, method= "knn", trControl = trainControl(method = "cv"))
I get the following error: Error: nrow(x) == n is not TRUE
The only example I've found is Error: nrow(x) == n is not TRUE when using Train in Caret ; my Y is already a factor vector with two classes, all the X features are factors as well. I've tried using as.matrix and as.data.frame on both the X and Y without success.
nrow(x) is equal to 250, but I'm not sure what the n is referring to in the package.
Upvotes: 3
Views: 9089
Reputation: 3688
y
is not actually a vector, but a data.frame with one column because bank[7]
does not convert the 7th column into a vector, so length(y)
is 1. Use bank[, 7]
instead. It does not make a difference for x
but it could as well be generated by bank[, 1:6]
.
Additionally to make KNN work you probably have to convert the x
data.frame that consists of factor variables to numeric dummy variables.
x=model.matrix(~. - 1, bank[, 1:6])
y=bank[, 7]
bank.knn <- train(x, y, method= "knn",
trControl = trainControl(method = "cv"))
Upvotes: 6
Reputation: 263301
I'm not a caret user but I think you have two problems. The extraction method you used did not deliver an atomic vector but rahter a list that contained a vector. If you asked for length(y) you get 1 rather than 250. The first error is easily solved by changing to this definition of y
:
y <- bank[[7]] # extract a vector rather than a sublist
Then things get messy. The KNN method expects continuous data (and the error messages you get indicate the caret's author considers it a "regression method" and you are passing factor data, so you therefore need to choose a classification method instead.
Upvotes: 0