Dealing with 150GB heap in non interactive application

Question

Hello I am having a case of 150GB heap memory program using In Memory Data grid. I have some crazy requirement from the operational department to use a single machine. Now we all know what happens in if the parallel garbage collector is used over 150GB probably it will be tens of minutes of garbage collection if the FULL GC is invoked.

My hope was that with Java 9 is coming Shenandoah low pause GC. Unfortunately from what I see it is not listed for delivery in Java 9. Does anyone knows anything about that ?

Never the less, I am wondering how G1 GC will perform for this amount of Heap memory.

And one last question. Since I have non interactive batch application that is supposed to complete in 2 hours lets say. The main goal here is to ensure that the Full GC never kicks in. If I ensure that there is plenty of memory lets say if the maximum heap that can be reached is 150 and I allocate it 250GB may I say with good confidence that the Full GC will never kick in or ? Usually full GC is triggered if the new generation + the old generation touches the maximum heap. Can it be triggered in a different way ?

There is a duplicate request made I will try to explain here why this question is not a duplicate. First we are talking about 150GB Heap which adds completely different dimension to the question. Second I dont use RMI as it is in the question mentioned, third I am asking question about G1 garbage collector in between the lines.Also once we go beyond the 32GB heap barrier we are entering the 64 bit address space you can not convince me that a question in regards of <32GB Heap is the same as a question with heap >32GB Not to mentioned that things have changed a bit since Java 7 for instance PermSpace does not exist.

the8472 · Accepted Answer

The rule of thumb for a compacting GC is that it should be able to process 1 GB of live objects per core per second.

Example on an Haswell i7 (4 cores/8 threads) and 20GB heap with the parallel collector:

[24.757s][info][gc,heap        ] GC(109) PSYoungGen: 129280K->0K(917504K)
[24.757s][info][gc,heap        ] GC(109) ParOldGen: 19471666K->7812244K(19922944K)
[24.757s][info][gc             ] GC(109) Pause Full (Ergonomics) 19141M->7629M(20352M) (23.791s, 24.757s) 966.174ms
[24.757s][info][gc,cpu         ] GC(109) User=6.41s Sys=0.02s Real=0.97s

The live set after compacting is 7.6GB. It takes 6.4 seconds worth of cpu-time, due to parallelism this translates to <1s pause time.

In principle the parallel collector should be able to handle a 150GB heap with full GC times < ~2 minutes on a multi-core system, even when most of the heap consists of live objects.

Of course this is just a rule of thumb. Some things that can affect it negatively:

paging
thermal CPU throttling
workloads consisting of very large, reference-heavy objects
non-local memory traffic in NUMA configurations
other processes competing for CPU time
heavy use of weak/soft references

In some cases tuning may be necessary to achieve this throughput.

If the Parallel collector does not work despite all that then CMS and G1 can be viable alternatives but only if there is enough spare heap capacity and CPU cores available to the JVM. They need significant breathing room to do their concurrent work without risking a full GC.

It is correct I said no interactive, but still I have a strict license agreements. I need to be finished with the whole processing in an hour. So I can no afford 30 minutes stop the world event.

Basically, you don't really need low pause times in the sense that CMS, G1, Shenandoah or Zing aim for (they aim for <100ms or even <10ms even on large heaps).

All you need is that STW pauses are not so catastrophically bad that they eat a significant portion of your compute time.

This should be feasible with most of the available collectors, ignoring the serial one.

In practice there are some pathological edge cases where they may fall down, but to get to that point you need setup a system with your actual workload and do some test runs. If you experience some real problems, then you can ask a question with more details.

Dealing with 150GB heap in non interactive application

Answers (1)

Related Questions