Caret doesn't run in parallel

Question

Actual parallelizing caret depends on R , caret and doMC packages . As described at Parallelizing Caret code

Does anyone working with similar enviroment as I do ? What the max R version where R caret paralellization working correctly ?

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=C                  LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-52    ggplot2_1.0.1   lattice_0.20-31 doMC_1.3.3      iterators_1.0.7 foreach_1.4.2   RStudioAMI_0.2 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1         magrittr_1.5        splines_3.2.1       MASS_7.3-41         munsell_0.4.2       colorspace_1.2-6   
 [7] minqa_1.2.4         car_2.1-0           stringr_1.0.0       plyr_1.8.3          tools_3.2.1         pbkrtest_0.4-2     
[13] nnet_7.3-9          grid_3.2.1          gtable_0.1.2        nlme_3.1-120        mgcv_1.8-6          quantreg_5.19      
[19] MatrixModels_0.4-1  gtools_3.5.0        lme4_1.1-9          digest_0.6.8        Matrix_1.2-0        nloptr_1.0.4       
[25] reshape2_1.4.1      codetools_0.2-11    stringi_0.5-5       BradleyTerry2_1.0-6 scales_0.3.0        stats4_3.2.1       
[31] SparseM_1.7         brglm_0.5-9         proto_0.3-10

Update 1 : My code follows :

library(doMC) ; registerDoMC(cores=4)
library(caret)
classification_formula <- as.formula(paste("target" ,"~",
                                             paste(names(m_input_data)[!names(m_input_data)=='target'],collapse="+")))

CVfolds <- 2
CVreps  <- 5
ma_control <- trainControl(method = "repeatedcv",
                             number = CVfolds,
                             repeats = CVreps ,
                             returnResamp = "final" ,
                             classProbs = T,
                             summaryFunction = twoClassSummary,
                             allowParallel = TRUE,verboseIter = TRUE)
 rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))
 rf <- train(classification_formula , data = m_input_data , method = "rf", metric="ROC" ,trControl = ma_control, tuneGrid = rf_tuneGrid , ntree = 101)

Update 2 : When I run from command line the only one core is working When I run these script from Rstudio the paralell is working since I see 4 processes via top . But a second after this the error happens :

  Error in names(resamples) <- gsub("^\.", "", names(resamples)) : 
   attempt to set an attribute on NULL

Update 4 :

Hi , it seems the problem was in R session that was terminated . Each time I am start AWS instance I was run the R code with now refresh the R engine . Now each time I refresh Rstudio browser I do Session -> Restart R . Seems it runs . I am checking now if the same for run the script from Ubuntu command line.

Generally it is running without to finish . Caret parallel on the data level . It means it is able to process each resample on different process . But if sample still big ( 100,000 / 2 ( number of folds = 2) X 2,000 features ) this can be hard to finish for each processor unit . Am I right ?

I think the parallelism must on algorithm level . It means each algorithm run likely to run on several cores . If such algorithm imlpementation avialable in caret ???

Caret doesn't run in parallel

Answers (1)

Related Questions

Caret doesn&#39;t run in parallel

Answers (1)

Related Questions

Caret doesn't run in parallel