user1642513
user1642513

Reputation:

R caret LDA error when using resampling

I am running into a problem using LDA through caret with caregorical predictors. For some reason, enabling resampling throws an error that isn't very informative. Has anyone seen this before?

Here is a reproducible toy example:

library(caret)
library(MASS)
DF <- data.frame(y = sample(as.factor(1:2), 200, replace = T), x1 = sample(as.factor(1:2), 200, replace = T), x2 = sample(as.factor(1:2), 200, replace = T))

# These two lines produce the same results
lda(DF[, -1], DF[, 1])
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'none'))$finalModel

# This gives an error
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'cv'))$finalModel

Error in train.default(DF[, -1], DF[, 1], method = "lda", trControl = trainControl(method = "cv")) : 
  Stopping

Upvotes: 0

Views: 1392

Answers (1)

thie1e
thie1e

Reputation: 3688

This seems to happen when using factor variables as independent variables while not using the formula interface. This works:

train(y ~ x1 + x2, data = DF, method = 'lda', 
      trControl = trainControl(method = 'cv'))$finalModel

Alternatively, after converting the factor variables to binary dummy variables the x/y-Syntax also works:

# Convert independent variables to dummy variables
DF$x1 <- as.numeric(DF$x1 == "2")
DF$x2 <- as.numeric(DF$x2 == "2")
train(DF[, -1], DF[, 1], method = 'lda', 
      trControl = trainControl(method = 'cv'))$finalModel

Note that depending on the method the reported group means are either around 0.5 or around 1.5, since the first two methods in the question apparently coerce the factor levels to 1 or 2 (numerical).

Upvotes: 0

Related Questions