Dana Averbuch
Dana Averbuch

Reputation: 116

Inconsistent "best tune" and "Resampling results across tuning parameters" caret R package

I am trying to create a model using Caret with a tune grid

svmGrid <- expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50,100))

and then once again with a subset of this grid:

svmGrid <- expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50))

The problem is that I get different "best tune" and "resampling results across tuning parameters", although the C parameter value that was chosen for the first tune grid, is also appear in the second tune grid.

I also encounter these discrepancies when using different options for the sampling parameter and also when using different summaryFunction options in trainControl()

Needless to say, since different best model is selected every time, it effects the prediction results on a test sets.

Any one has a clue why is it happening?

Reproducible data set:

library(caret)
library(doMC)
registerDoMC(cores = 8)

set.seed(2969)
imbal_train <- twoClassSim(100, intercept = -20, linearVars = 20)
imbal_test  <- twoClassSim(100, intercept = -20, linearVars = 20)
table(imbal_train$Class)

Run using the first tune grid

svmGrid <-  expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50,100))

up_fitControl = trainControl(method = "cv", number = 10 , savePredictions = TRUE, allowParallel = TRUE, sampling = "up", seeds = NA)


set.seed(5627)
up_inside <- train(Class ~ ., data = imbal_train,
                   method = "svmLinear",
                   trControl = up_fitControl,
                   tuneGrid = svmGrid,
                   scale = FALSE)

up_inside

First run output:

> up_inside
Support Vector Machines with Linear Kernel 

100 samples
 25 predictors
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa         Accuracy SD  Kappa SD 
  1e-04  0.7734343   0.252201364  0.1227632    0.3198165
  1e-03  0.8225253   0.396439198  0.1245455    0.3626456
  1e-02  0.7595960   0.116150973  0.1431780    0.3046825
  1e-01  0.7686869   0.051430454  0.1167093    0.2712062
  1e+00  0.7695960  -0.004261294  0.1162279    0.2190151
  1e+01  0.7093939   0.111852756  0.2030250    0.3810059
  2e+01  0.7195960   0.040458804  0.1932690    0.2580560
  3e+01  0.7195960   0.040458804  0.1932690    0.2580560
  4e+01  0.7195960   0.040458804  0.1932690    0.2580560
  5e+01  0.7195960   0.040458804  0.1932690    0.2580560
  1e+02  0.7195960   0.040458804  0.1932690    0.2580560

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 0.001. 

Run using the second tune grid

svmGrid <-  expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50))

up_fitControl = trainControl(method = "cv", number = 10 , savePredictions = TRUE, allowParallel = TRUE, sampling = "up", seeds = NA)


set.seed(5627)
up_inside <- train(Class ~ ., data = imbal_train,
                   method = "svmLinear",
                   trControl = up_fitControl,
                   tuneGrid = svmGrid,
                   scale = FALSE)

up_inside

Second run output:

> up_inside
Support Vector Machines with Linear Kernel 

100 samples
 25 predictors
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa         Accuracy SD  Kappa SD 
  1e-04  0.8125253   0.392165694  0.13043060   0.3694786
  1e-03  0.8114141   0.375569633  0.12291273   0.3549978
  1e-02  0.7995960   0.205413345  0.06734882   0.2662161
  1e-01  0.7495960   0.017139266  0.09742161   0.2270128
  1e+00  0.7695960  -0.004261294  0.11622791   0.2190151
  1e+01  0.7093939   0.111852756  0.20302503   0.3810059
  2e+01  0.7195960   0.040458804  0.19326904   0.2580560
  3e+01  0.7195960   0.040458804  0.19326904   0.2580560
  4e+01  0.7195960   0.040458804  0.19326904   0.2580560
  5e+01  0.7195960   0.040458804  0.19326904   0.2580560

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 1e-04.

Upvotes: 4

Views: 1321

Answers (1)

Tchotchke
Tchotchke

Reputation: 3121

If you don't provide seeds in caret, it'll choose them for you. Since you have different lengths for your grid, the seeds will vary ever so slightly for your folds.

Below, I've pasted the example, where I just renamed your second model so the output for the comparison is easier to get:

> up_inside$control$seeds[[1]]
 [1] 825016 802597 128276 935565 324036 188187 284067  58853 923008 995461  60759
> up_inside2$control$seeds[[1]]
 [1] 825016 802597 128276 935565 324036 188187 284067  58853 923008 995461
> up_inside$control$seeds[[2]]
 [1] 966837 256990 592077 291736 615683 390075 967327 349693  73789 155739 916233
# See how the first seed here is the same as the last seed of the first model
> up_inside2$control$seeds[[2]]
 [1]  60759 966837 256990 592077 291736 615683 390075 967327 349693  73789

If you now go ahead and set your own seeds, you'll get the same output:

# Seeds for your first train
myseeds <- list(c(1:10,1000), c(11:20,2000), c(21:30, 3000),c(31:40, 4000),c(41:50, 5000),
                c(51:60, 6000),c(61:70, 7000),c(71:80, 8000),c(81:90, 9000),c(91:100, 10000), c(343))
# Seeds for your second train
myseeds2 <- list(c(1:10), c(11:20), c(21:30),c(31:40),c(41:50),c(51:60),
                 c(61:70),c(71:80),c(81:90),c(91:100), c(343))

> up_inside
Support Vector Machines with Linear Kernel 

100 samples
 25 predictor
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa      
  1e-04  0.7714141  0.239823027
  1e-03  0.7914141  0.332834590
  1e-02  0.7695960  0.207000745
  1e-01  0.7786869  0.103957926
  1e+00  0.7795960  0.006849817
  1e+01  0.7093939  0.111852756
  2e+01  0.7195960  0.040458804
  3e+01  0.7195960  0.040458804
  4e+01  0.7195960  0.040458804
  5e+01  0.7195960  0.040458804
  1e+02  0.7195960  0.040458804

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 0.001. 
> up_inside2
Support Vector Machines with Linear Kernel 

100 samples
 25 predictor
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa      
  1e-04  0.7714141  0.239823027
  1e-03  0.7914141  0.332834590
  1e-02  0.7695960  0.207000745
  1e-01  0.7786869  0.103957926
  1e+00  0.7795960  0.006849817
  1e+01  0.7093939  0.111852756
  2e+01  0.7195960  0.040458804
  3e+01  0.7195960  0.040458804
  4e+01  0.7195960  0.040458804
  5e+01  0.7195960  0.040458804

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 0.001.

Upvotes: 4

Related Questions