Ravi Shankar Hela
Ravi Shankar Hela

Reputation: 93

caret train Error: At least one of the class levels is not a valid R variable name

I am trying to run

control = trainControl(
  method = "cv",
  number = 5,
  classProbs = TRUE,
  summaryFunction = twoClassSummary
  ) 

model_fit <-
  caret::train(
  Survived ~ .,
  data = train_cleaned_model_train,
  method = "glm",
  family = binomial(link = logit),
  preProc = c("knnImpute", "nzv"),
  metric = "ROC",
  trControl = control
  )

The names for input variables seems ok as per results below

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   534 obs. of  9 variables:
 $ PassengerId: int  71 744 353 636 436 433 687 329 95 193 ...
 $ Survived   : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 1 2 1 2 ...
 $ Pclass     : int  2 3 3 2 1 2 3 3 3 3 ...
 $ Sex        : chr  "male" "male" "male" "female" ...
 $ Age        : num  32 24 15 28 14 42 14 31 59 19 ...
 $ SibSp      : int  0 1 1 0 1 1 4 1 0 1 ...
 $ Parch      : int  0 0 1 0 2 0 1 1 0 0 ...
 $ Fare       : num  10.5 16.1 7.23 13 120 ...
 $ Embarked   : chr  "S" "S" "C" "S" ...

I have seen other questions regarding same issue. They mostly have issuses with variable names which are either starting with special characters or numbers. That does not seems to be the case.

Can anyone give insights?

Upvotes: 1

Views: 703

Answers (1)

Ravi Shankar Hela
Ravi Shankar Hela

Reputation: 93

I realized that the outcomes needs to be coded as character variables and cannot be numeric 1 or 0.

Upvotes: 1

Related Questions