Reputation: 2300
I've created and tuned multiple models, but I run into issues when I try to predict them. I first run my code as followed to tune an LDA model.
library(MASS)
library(caret)
library(randomForest)
data(survey)
data<-survey
#create training and test set
split <- createDataPartition(data$W.Hnd, p=.8)[[1]]
train<-data[split,]
test<-data[-split,]
#creating training parameters
control <- trainControl(method = "cv",
number = 10,
p =.8,
savePredictions = TRUE,
classProbs = TRUE,
summaryFunction = twoClassSummary)
#fitting and tuning model
lda_tune <- train(W.Hnd ~ . ,
data=train,
method = "glm" ,
metric = "ROC",
trControl = control)
However when I run
results <- predict(rf_tune, newdata=test)
,
the output is only 32 rows, when the test set has 46 rows. This is problematic as I create a data.frame
of the test results with the predicted values from multiple models to analyze using a confusion matrix. For instance, when I run this
results<-data.frame(obs = test$W.Hnd, lda = predict(lda_tune, newdata = test))
I get the error Error in
$<-.data.frame(
tmp, "rf_results", value = c(2L, 2L, 2L, :
replacement has 32 rows, data has 46
Can someone explain to me why caret is returning 32 predicted values when there are clearly 46 values to predict or when I explicitly call the model to predict the values in the test set?
Upvotes: 1
Views: 1497
Reputation: 23608
Running your code resulted in errors on my side. The twoClasSummary returns an error. But ignoring that, you are first talking about lda_tune and later about rf_tune.
Accounting for these issues, the problem lies with missing values in your test set. If you check nrow(test[complete.cases(test), ])
you will see that it returns 33 cases. Which is exactly what the predict returns.
I added the code below for refence. Including rf_tune and lda_tune + their results.
library(MASS)
library(caret)
library(randomForest)
data(survey)
data<-survey
#create training and test set
split <- createDataPartition(data$W.Hnd, p=.8)[[1]]
train<-data[split,]
test<-data[-split,]
#creating training parameters
control <- trainControl(method = "cv",
number = 10,
p =.8,
savePredictions = TRUE,
classProbs = TRUE)
#fitting and tuning model
lda_tune <- train(W.Hnd ~ . ,
data=train,
method = "glm" ,
metric = "ROC",
trControl = control)
rf_tune <- train(W.Hnd ~ . ,
data=train,
method = "rf" ,
metric = "ROC",
trControl = control)
lda_results <- data.frame(obs = test$W.Hnd[complete.cases(test)], lda = predict(lda_tune, newdata = test))
rf_results <- data.frame(obs = test$W.Hnd[complete.cases(test)], lda = predict(rf_tune, newdata = test))
Upvotes: 2