I have recently started learning about H2O AutoML. I am wondering which one of the following options works better. Single node with 6GB of memory or a cluster of three nodes with 2GB memory each. java -Xmx6g -jar h2o.jar -name MyCluster java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar & If there are drawbacks with single node deployment, can you recommend any methods to optimize the performance? Thanks in advance!

Reputation: 219

H2O single node Vs cluster

I have recently started learning about H2O AutoML. I am wondering which one of the following options works better. Single node with 6GB of memory or a cluster of three nodes with 2GB memory each.

java -Xmx6g -jar h2o.jar -name MyCluster
java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar &

If there are drawbacks with single node deployment, can you recommend any methods to optimize the performance? Thanks in advance!

Upvotes: 2

Answers (2)

Erin LeDell

Reputation: 8819

Running H2O on a single node is always better (when possible) because there's communication overhead between the cluster nodes. Models will train faster on a single node.

Upvotes: 0

pveentjer

Reputation: 11392

My guess is that the first approach will give better performance due to less context switching. I'm not too familiar with H2O but I guess they start a thread per core. So if you have 3 H2O instances, you get 3 threads per core which will lead to an increased number of context switches and hence reduced performance.

And I'm pretty sure that H2O can work with huge amounts of memory. They can pool the created arrays, so there should not be too much need for garbage collection for the actual data.

Upvotes: 2

H2O single node Vs cluster

Answers (2)

Related Questions