Shean
Shean

Reputation: 31

Memory Management in H2O

I am curious to know how memory is managed in H2O. Is it completely 'in-memory' or does it allow swapping in case the memory consumption goes beyond available physical memory? Can I set -mapperXmx parameter to 350GB if I have a total of 384GB of RAM on a node? I do realise that the cluster won't be able to handle anything other than the H2O cluster in this case. Any pointers are much appreciated, Thanks.

Upvotes: 3

Views: 538

Answers (1)

TomKraljevic
TomKraljevic

Reputation: 3671

  1. H2O-3 stores data completely in-memory in a distributed column-compressed distributed key-value store.

  2. No swapping to disk is supported.

  3. Since you are alluding to mapperXmx, I assume you are talking about running H2O in a YARN environment. In that case, the total YARN container size allocated per node is:

    mapreduce.map.memory.mb = mapperXmx * (1 + extramempercent/100)

extramempercent is another (rarely used) command-line parameter to h2odriver.jar. Note the default extramempercent is 10 (percent).

mapperXmx is the size of the Java heap, and the extra memory referred to above is for additional overhead of the JVM implementation itself (e.g. the C/C++ heap).

YARN is extremely picky about this, and if your container tries to use even one byte over its allocation (mapreduce.map.memory.mb), YARN will immediately terminate the container. (And for H2O-3, since it's an in-memory processing engine, the loss of one container terminates the entire job.)

You can set mapperXmx and extramempercent to as large a value as YARN has space to start containers.

Upvotes: 4

Related Questions