python dev
python dev

Reputation: 219

H2O single node Vs cluster

I have recently started learning about H2O AutoML. I am wondering which one of the following options works better. Single node with 6GB of memory or a cluster of three nodes with 2GB memory each.

  1. java -Xmx6g -jar h2o.jar -name MyCluster
  2. java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar &

If there are drawbacks with single node deployment, can you recommend any methods to optimize the performance? Thanks in advance!

Upvotes: 2

Views: 407

Answers (2)

Erin LeDell
Erin LeDell

Reputation: 8819

Running H2O on a single node is always better (when possible) because there's communication overhead between the cluster nodes. Models will train faster on a single node.

Upvotes: 0

pveentjer
pveentjer

Reputation: 11392

My guess is that the first approach will give better performance due to less context switching. I'm not too familiar with H2O but I guess they start a thread per core. So if you have 3 H2O instances, you get 3 threads per core which will lead to an increased number of context switches and hence reduced performance.

And I'm pretty sure that H2O can work with huge amounts of memory. They can pool the created arrays, so there should not be too much need for garbage collection for the actual data.

Upvotes: 2

Related Questions