Pavel Moukhataev
Pavel Moukhataev

Reputation: 309

G1 GC — extremely long termination time

G1 GC sometimes spends a lot of time in termination phase. As you can see while GC worker average time is 442.9 termination is 327.3. That is high performance low latency application that processes a lot of messages. Event processing data must be got collected by yong gc. Usually event processing takes no more than 150 ms.

HW: x24 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 32Gb mem, 8Gb mem is free JVM Options: -server -XX:+UseG1GC -XX:MaxGCPauseMillis=60 -XX:ParallelGCThreads=24 -Xmx12g -Xms12g -Xss256k Linux 2.6.32-696.6.3.el6.x86_64 java version "1.8.0_102"

Typically G1 sets new size to 7Gb, young GC pause is 50ms, termination time is just 6ms, interval between collections is 3 sec.

If application produces a lot of long-living objects then young pause can increase and that also results in decreasing young GC size. But in that case termination time remains the same. And I wonder why termination time can increase so drammatically.

Please note also big system time for this pause. Usually system time for GC pause is 0-10ms. For this one it is 1 sec.

2017-08-23T13:21:37.690-0400: 1509.022: [GC pause (G1 Evacuation Pause) (young), 0.4474119 secs]
   [Parallel Time: 443.2 ms, GC Workers: 24]
      [GC Worker Start (ms): Min: 1509022.9, Avg: 1509023.0, Max: 1509023.4, Diff: 0.4]
      [Ext Root Scanning (ms): Min: 2.1, Avg: 22.5, Max: 90.4, Diff: 88.3, Sum: 539.7]
      [Update RS (ms): Min: 0.0, Avg: 2.2, Max: 5.0, Diff: 5.0, Sum: 54.0]
         [Processed Buffers: Min: 0, Avg: 24.7, Max: 67, Diff: 67, Sum: 592]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.7]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 26.1, Avg: 106.1, Max: 435.6, Diff: 409.6, Sum: 2545.3]
      [Termination (ms): Min: 0.0, Avg: 312.0, Max: 327.3, Diff: 327.3, Sum: 7488.6]
         [Termination Attempts: Min: 1, Avg: 1.5, Max: 4, Diff: 3, Sum: 36]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.6]
      [GC Worker Total (ms): Min: 442.5, Avg: 442.9, Max: 443.0, Diff: 0.5, Sum: 10628.9]
      [GC Worker End (ms): Min: 1509465.9, Avg: 1509465.9, Max: 1509465.9, Diff: 0.1]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 3.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.5 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.4 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.3 ms]
   [Eden: 476.0M(476.0M)->0.0B(556.0M) Survivors: 136.0M->56.0M Heap: 4480.6M(12.0G)->4016.3M(12.0G)]
 [Times: user=4.23 sys=1.08, real=0.45 secs]

Upvotes: 2

Views: 1619

Answers (1)

Laxman Prabhu
Laxman Prabhu

Reputation: 96

I have a suggestion. Sometimes when there are other processes running on the system, they contend with your Java GC workers.

[Object Copy (ms): Min: 26.1, Avg: 106.1, Max: 435.6, Diff: 409.6, Sum: 2545.3] [Ext Root Scanning (ms): Min: 2.1, Avg: 22.5, Max: 90.4, Diff: 88.3, Sum: 539.7]

From the above logs, it looks like the GC workers are doing a dis proportionate amount of work. This means most GC threads finish early and a few of them are still working. When this happens the completed threads try to steal work from others. This is reflected in the termination time. As a result termination time is not the cause, but just a side effect.

[Termination (ms): Min: 0.0, Avg: 312.0, Max: 327.3, Diff: 327.3, Sum: 7488.6]

You have a 24 core machine and have 24 GC worker threads. If there are other processes or system activity which needs CPU resources, chances are that some of the GC workers are scheduled late because of contention.

Its worth trying to reduce your number of threads to 18 and give it a shot.

You can also increase the gc log level, when you do this it prints the start time of each of the threads. This can be helpful to debug the application.

Upvotes: 2

Related Questions