Shaun Lebron
Shaun Lebron

Reputation: 2541

H2O GBM slows down when adding more cores

Training the following GBM model on 2 cores vs 96 cores (on EC2 c5.large and c5.metal) results in faster training times when using less cores. I checked the water meter to verify all cores were running.

Training times: c5.large (2 cores): ~1min c5.metal (96 cores): ~2min Training details:

training set size     6840 rows x 95 cols

seed                  1
ntrees                1000
max_depth             50
min_rows              10
learn_rate            0.005
sample_rate           0.5
col_sample_rate       0.5
stopping_rounds       2
stopping_metric       "MSE"
stopping_tolerance    1.0E-5
score_tree_interval   500
histogram_type        "UniformAdaptive"
nbins                 800
nbins_top_level       1024

Any thoughts on why this is happening?

Upvotes: 1

Views: 94

Answers (1)

Maurever
Maurever

Reputation: 157

I think the reason is that the parallel speed is composed of two main components:

  1. computing time on every single core
  2. communicating time to communicate and collecting results

If you have small data and a lot of cores, the algorithm could slow down due to huge communication. Try for example 4, 6, 10 cores instead of 96 to speed up.

Upvotes: 3

Related Questions