Reputation:
I am running into a problem using LDA
through caret
with caregorical predictors. For some reason, enabling resampling throws an error that isn't very informative. Has anyone seen this before?
Here is a reproducible toy example:
library(caret)
library(MASS)
DF <- data.frame(y = sample(as.factor(1:2), 200, replace = T), x1 = sample(as.factor(1:2), 200, replace = T), x2 = sample(as.factor(1:2), 200, replace = T))
# These two lines produce the same results
lda(DF[, -1], DF[, 1])
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'none'))$finalModel
# This gives an error
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'cv'))$finalModel
Error in train.default(DF[, -1], DF[, 1], method = "lda", trControl = trainControl(method = "cv")) :
Stopping
Upvotes: 0
Views: 1392
Reputation: 3688
This seems to happen when using factor variables as independent variables while not using the formula interface. This works:
train(y ~ x1 + x2, data = DF, method = 'lda',
trControl = trainControl(method = 'cv'))$finalModel
Alternatively, after converting the factor variables to binary dummy variables the x/y-Syntax also works:
# Convert independent variables to dummy variables
DF$x1 <- as.numeric(DF$x1 == "2")
DF$x2 <- as.numeric(DF$x2 == "2")
train(DF[, -1], DF[, 1], method = 'lda',
trControl = trainControl(method = 'cv'))$finalModel
Note that depending on the method the reported group means are either around 0.5 or around 1.5, since the first two methods in the question apparently coerce the factor levels to 1 or 2 (numerical).
Upvotes: 0