Reputation: 20415
I am building a GBM model with rather large datasets. data.table
is great for processing data. But when I run GBM model, it takes forever to finish. Looking at Activity Monitor (in Mac), I can see the process doesn't use up all memory, and doesn't max out processor.
Since GBM is single core, and I can't modify it to run on multicore, what are my options to improve my run time? Right now I am using Macbook Air with 4BG RAM and 1.7GHz i5.
I am not sure which of the following options would help performance the most: buying a (i) computer with bigger memory; (ii) get a more powerful chip (i7), or (iii) use Amazon AWS and install R there. How each of these will help?
Add sample code per Brandson's request:
library(gbm)
GBM_NTREES = 100
GBM_SHRINKAGE = 0.05
GBM_DEPTH = 4
GBM_MINOBS = 50
GBM_model <- gbm.fit(
x = data[,-target] ,
y = data[,target] ,
#var.monotone = TRUE, #NN added
distribution = "gaussian"
,n.trees = GBM_NTREES ,
shrinkage = GBM_SHRINKAGE ,
interaction.depth = GBM_DEPTH ,
n.minobsinnode = GBM_MINOBS ,
verbose = TRUE)
Upvotes: 2
Views: 3194
Reputation:
Maybe something worth considering is using the XGBoost library. According to the Github repo:
"XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way."
I also realize the original question is quite old, but maybe this will help someone out down the road.
Upvotes: 1
Reputation: 648
This seems to be more about parallel computing in R in general, rather than a specific question about gbm. I would start here: http://cran.r-project.org/web/views/HighPerformanceComputing.html.
Upvotes: 0