Ricky
Ricky

Reputation: 4686

Caret "Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined" when optimizing for ROC

I'm trying to create a binary classifier, modelling with caret to optimize ROC. The method I was attempting was C5.0 and I get the following error and warning:

Error in train.default(x, y, weights = w, ...) : 
  final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
  missing values found in aggregated results

I had modelled the same training data with C5.0 and caret earlier but optimizing for Accuracy and not using twoClassSummary in the control, and it ran without error.

My tuning grid and control for ROC run were

c50Grid <- expand.grid(.trials = c(1:9, (1:10)*10),
                       .model = c("tree", "rules"),
                       .winnow = c(TRUE, FALSE))

fitTwoClass <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 5,
  classProbs=TRUE,
  summaryFunction = twoClassSummary
  )

During Accuracy run, I omitted classProbs and summaryFunction portion of the control.

For the modeling, the command was

fitModel <- train(
  Unhappiness ~ .,
  data = dnumTrain,
  tuneGrid=c50Grid,
  method = "C5.0",
  trControl = fitTwoClass,
  tuneLength = 5,
  metric= "ROC"
  )

Can anyone advise how to troubleshoot this? Not sure what parameter to be tweaked if any to make this work, while I believe the dataset should be OK (since it ran OK when optimizing for Accuracy).

To reproduce, training set dnumTrain can be loaded from the file in this link.

Upvotes: 2

Views: 3796

Answers (1)

Ricky
Ricky

Reputation: 4686

I think I may have got this solved: after seeing in the comments that @Pascal was able to run the code without error, and realising I got a pretty random result running it with ctree, I investigated further areas that may have to do with randomness: random seed.

It seems the problem comes from me parallelising the process using doSNOW to 4 processors, and there is a need to set the seed for each iteration to avoid randomness creeping in (see answer to this question). I suspect random data causes some folds to have no valid values.

In any case I set the seeds as below:

CVfolds <- 5
CVreps <- 5
seedNum <- CVfolds * CVreps + 1
seedLen <- CVfolds + tuneLength
# create manual seeds vector for parallel processing repeatibility
set.seed(123)
seeds <- vector(mode = "list", length = seedNum)
for(i in 1:(seedNum-1)) seeds[[i]] <- sample.int(1000, seedLen)  
## For the last model:
seeds[[seedNum]] <- sample.int(1000, 1)

fitTwoClass <- trainControl(
  method = "repeatedcv",
  number = CVfolds,
  repeats = CVreps,
  classProbs=TRUE,
  summaryFunction = twoClassSummary,
  seeds = seeds
  )

So far I have re-trained fitModel 3 times and no error/warning yet, so I hope this is indeed the answer to my problem.

Upvotes: 2

Related Questions