AdamNYC
AdamNYC

Reputation: 20415

Improve run time for GBM package

I am building a GBM model with rather large datasets. data.table is great for processing data. But when I run GBM model, it takes forever to finish. Looking at Activity Monitor (in Mac), I can see the process doesn't use up all memory, and doesn't max out processor.

Since GBM is single core, and I can't modify it to run on multicore, what are my options to improve my run time? Right now I am using Macbook Air with 4BG RAM and 1.7GHz i5.

I am not sure which of the following options would help performance the most: buying a (i) computer with bigger memory; (ii) get a more powerful chip (i7), or (iii) use Amazon AWS and install R there. How each of these will help?

Add sample code per Brandson's request:

library(gbm) 

GBM_NTREES = 100 
GBM_SHRINKAGE = 0.05 
GBM_DEPTH = 4 
GBM_MINOBS = 50

GBM_model <- gbm.fit(
  x = data[,-target] ,
  y = data[,target] ,
  #var.monotone = TRUE, #NN added
  distribution = "gaussian"
  ,n.trees = GBM_NTREES ,
  shrinkage = GBM_SHRINKAGE ,
  interaction.depth = GBM_DEPTH ,
  n.minobsinnode = GBM_MINOBS ,
  verbose = TRUE)

Upvotes: 2

Views: 3194

Answers (2)

user3720516
user3720516

Reputation:

Maybe something worth considering is using the XGBoost library. According to the Github repo:

"XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way."

I also realize the original question is quite old, but maybe this will help someone out down the road.

Upvotes: 1

tmakino
tmakino

Reputation: 648

This seems to be more about parallel computing in R in general, rather than a specific question about gbm. I would start here: http://cran.r-project.org/web/views/HighPerformanceComputing.html.

Upvotes: 0

Related Questions