JFG123
JFG123

Reputation: 597

Recipe vs Formula vs X/Y Interface reproducibility for gbm with caret

I have trained the same model on the iris data set to investigate the reproducibility of each method. It seems that there is a discrepency between models when using all.equal() for the models trained with the recipes interface, but not with the formula or x/y interface. This issue seems to be specific to gbm (same structure works fine for Model = rf or lm).

Is there something that the recipes formula does that is specific for gbm. Or is it my computer set up. Curious to see if others can reproduce the errors.

library(plyr)
library(tidyverse)
library(gbm)
library(caret)
library(recipes)

# recipe to be supplied
Recipe.Obj <- recipe(Sepal.Length ~ ., data = iris)

# train control object
TC.Obj <- trainControl("cv", savePredictions = "all", summaryFunction = defaultSummary, returnResamp = "all")

Model = "gbm"
Recipe = Recipe.Obj
TC = TC.Obj
Training.Data.Set = iris
Metric = "RMSE"

# Using a recipe object

set.seed(0)
Model.Obj.1 <- train(Recipe,
                     method = Model,
                     data = Training.Data.Set,
                     trControl = TC,
                     metric = Metric,
                     verbose = FALSE,
                     tuneLength = 3
)

set.seed(0)
Model.Obj.2 <- train(Recipe,
                     method = Model,
                     data = Training.Data.Set,
                     trControl = TC,
                     metric = Metric,
                     verbose = FALSE,
                     tuneLength = 3
)

# does not return equal objects
all.equal(Model.Obj.1, Model.Obj.2)

[1] "Component “results”: Component “RMSE”: Mean relative difference: 0.0006642504"     
 [2] "Component “results”: Component “Rsquared”: Mean relative difference: 0.0007520043" 
 [3] "Component “results”: Component “MAE”: Mean relative difference: 0.001153074"       
 [4] "Component “results”: Component “RMSESD”: Mean relative difference: 0.001743611"    
 [5] "Component “results”: Component “RsquaredSD”: Mean relative difference: 0.006758813"
 [6] "Component “results”: Component “MAESD”: Mean relative difference: 0.006780553"     
 [7] "Component “pred”: Component “pred”: Mean relative difference: 0.00312338"          
 [8] "Component “resample”: Component “RMSE”: Mean relative difference: 0.003475617"     
 [9] "Component “resample”: Component “Rsquared”: Mean relative difference: 0.002615116" 
[10] "Component “resample”: Component “MAE”: Mean relative difference: 0.004711215"      
[11] "Component “times”: Component “everything”: Mean relative difference: 0.148289"     
[12] "Component “times”: Component “final”: Mean relative difference: 0.5"

# Using formula

set.seed(0)
Model.Obj.3 <- train(Sepal.Length ~ .,
                     method = Model,
                     data = Training.Data.Set,
                     trControl = TC,
                     metric = Metric,
                     verbose = FALSE,
                     tuneLength = 3
)

set.seed(0)
Model.Obj.4 <- train(Sepal.Length ~ .,
                     method = Model,
                     data = Training.Data.Set,
                     trControl = TC,
                     metric = Metric,
                     verbose = FALSE,
                     tuneLength = 3
)

#returns equal objects except for times
all.equal(Model.Obj.3, Model.Obj.4)

# Using x/y

set.seed(0)
Model.Obj.5 <- train(Training.Data.Set[,-1],Training.Data.Set[,1],
                     method = Model,
                     trControl = TC,
                     metric = Metric,
                     verbose = FALSE,
                     tuneLength = 3
)

set.seed(0)
Model.Obj.6 <- train(Training.Data.Set[,-1], Training.Data.Set[,1],
                     method = Model,
                     trControl = TC,
                     metric = Metric,
                     verbose = FALSE,
                     tuneLength = 3
)
#returns equal objects except for times
all.equal(Model.Obj.5, Model.Obj.6)

Session Info:

sessionInfo()

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recipes_0.1.11  caret_6.0-86    lattice_0.20-38 gbm_2.1.5       forcats_0.4.0   stringr_1.4.0   dplyr_0.8.4     purrr_0.3.3     readr_1.3.1    
[10] tidyr_1.0.2     tibble_2.1.3    ggplot2_3.2.1   tidyverse_1.3.0 plyr_1.8.4     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1           lubridate_1.7.4      prettyunits_1.0.2    ps_1.3.0             class_7.3-14         assertthat_0.2.1    
 [7] packrat_0.5.0        ipred_0.9-8          foreach_1.4.4        R6_2.4.0             cellranger_1.1.0     backports_1.1.4     
[13] stats4_3.5.2         reprex_0.3.0         httr_1.4.1           pillar_1.4.3         rlang_0.4.5          lazyeval_0.2.1      
[19] readxl_1.3.1         data.table_1.11.8    rstudioapi_0.11      callr_3.4.3          rpart_4.1-13         Matrix_1.2-15       
[25] splines_3.5.2        gower_0.2.0          munsell_0.5.0        broom_0.5.4          compiler_3.5.2       modelr_0.1.6        
[31] pkgconfig_2.0.2      pkgbuild_1.0.6.9000  nnet_7.3-12          tidyselect_0.2.5     prodlim_2018.04.18   gridExtra_2.3       
[37] codetools_0.2-15     fansi_0.4.0          crayon_1.3.4         dbplyr_1.4.2         withr_2.1.2          ModelMetrics_1.2.2.2
[43] MASS_7.3-51.5        grid_3.5.2           nlme_3.1-137         jsonlite_1.6.1       gtable_0.2.0         lifecycle_0.1.0     
[49] DBI_1.0.0            magrittr_1.5         pROC_1.13.0          scales_1.0.0         cli_2.0.2            stringi_1.3.1       
[55] reshape2_1.4.3       fs_1.3.1             timeDate_3043.102    xml2_1.2.2           generics_0.0.2       vctrs_0.2.3         
[61] lava_1.6.5           iterators_1.0.10     tools_3.5.2          glue_1.4.0           hms_0.5.3            processx_3.4.1      
[67] survival_2.43-3      colorspace_1.4-0     rvest_0.3.5          haven_2.2.0         

Upvotes: 3

Views: 102

Answers (0)

Related Questions