maksay
maksay

Reputation: 277

caret train rf model - inexplicably long execution

While trying to train random forest model with caret package, I noticed that execution time is inexplicably long:

> set.seed = 1;
> n = 500;
> m = 30;
> x = matrix(rnorm(n * m), nrow = n);
> y = factor(sample.int(2, n, replace = T), labels = c("yes", "no"))
> require(caret);
> require(randomForest);
> print(system.time({rf <- randomForest(x, y);}));
   user  system elapsed 
   0.99    0.00    0.98 
> print(system.time({rfmod <- train(x = x, y = y,
+                method = "rf",
+                metric = "Accuracy",
+                trControl = trainControl(classProbs = T)
+ );}));
   user  system elapsed 
  95.83    0.71   97.26 

It seemed to me that execution should only be 10 times longer, since by default 10-fold cross-validation happens instead of a single run. I am not tuning any parameters but it seems that train does it automatically:

> rfmod$results
  mtry  Accuracy       Kappa AccuracySD    KappaSD
1    2 0.4736669 -0.04437013 0.03323485 0.06493845
2   16 0.4818095 -0.03241901 0.03279341 0.06426745
3   30 0.4878361 -0.02149108 0.02956972 0.05936881

That would explain at most 30 times difference. However, it runs almost 100 times longer. What could be the possible explanation?

Thanks in advance

Upvotes: 4

Views: 4456

Answers (1)

topepo
topepo

Reputation: 14316

You are not specifying method in trainControl so it defaults to 30 iterations of the bootstrap and, since tuneLength was also not set, you are doing it over 3 values of mtry.

A 99.2449-fold speedup should not be unexpected when you multiply the computational costs by 90-fold.

Max

Upvotes: 10

Related Questions