high cpu usage on openjdk zgc

Question

I am trying to use zgc in product environment, so I updated jdk from jdk8 to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related options, but the cpu usage goes to 1000+% after a few hours later of the jvm start(normal cpu usage should be 100-300%). Sometimes, when the cpu is high, there are many ICBufferFull safe point in the log file:

[2020-11-12T19:00:43.669+0800] Safepoint "ICBufferFull", Time since last: 41374119 ns, Reaching safepoint: 2026134 ns, At safepoint: 85993 ns, Total: 2112127 ns
[2020-11-12T19:00:43.672+0800] Safepoint "ICBufferFull", Time since last: 2521598 ns, Reaching safepoint: 1109894 ns, At safepoint: 57235 ns, Total: 1167129 ns
[2020-11-12T19:00:43.676+0800] Safepoint "ICBufferFull", Time since last: 2892867 ns, Reaching safepoint: 1240834 ns, At safepoint: 59633 ns, Total: 1300467 ns
[2020-11-12T19:00:43.681+0800] Safepoint "ICBufferFull", Time since last: 2870110 ns, Reaching safepoint: 1425837 ns, At safepoint: 54052 ns, Total: 1479889 ns

if make node offline about 30s, the cpu goes down, and then make it online, it will work normally for hours until the cpu goes high again. I checked the source code, ICBufferFull means "inline cache buffers being full" , but I cannot found the option to increase it, could any one help on this? thanks!

the gc options are:

export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=4 -Xss2m -XX:+UseZGC -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m -XX:+HeapDumpOnOutOfMemoryError'

edit1:

I have another host works well still running in jdk8 with cms as the reference, the requests to the 2 hosts are almost identical.

I profiled it with async-profile, the most hot method is java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, which occurred 50+%, the most hot native method is ZMark::try_mark_object(ZMarkCache*, unsigned long, bool) , which occurred 0.41% only. I checked the thread local related source code in jdk8 and openjdk15, seems not changed.

edit2:

I found a similar bug on JIRA, my application is lucene related too, and from the gc log, when the weak references count is 1m+, the usage is high. so the question would be how to collect weak reference more aggressively in zgc?

edit3:

From the following logs with System.gc() called every 3s, it seems that my application produced too much weak reference. but it is strange that the produced count keep increasing after start. the requests is almost constant from 11:00-17:00. Note that cpu dropped from 600% to 400% automatically after GC(9821), the enqueued is ~250K. GC(10265) the node was offline, enqueued was ~770K. why the enqueued count is small in long time, does it mean the objects are not reclaimed fully?

[2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered, 72334 discovered, 0 enqueued
[2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462 encountered, 122216 discovered, 1380 enqueued
[2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598 encountered, 107228 discovered, 677 enqueued
[2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536 encountered, 82199 discovered, 1713 enqueued
[2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946 encountered, 291651 discovered, 292 enqueued
[2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065 encountered, 124351 discovered, 815 enqueued
[2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932 discovered, 353 enqueued
[2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369 discovered, 4648 enqueued
[2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932 discovered, 353 enqueued
[2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369 discovered, 4648 enqueued
[2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757 encountered, 928748 discovered, 1691 enqueued
[2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080 encountered, 841168 discovered, 247352 enqueued
[2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253 encountered, 568564 discovered, 3938 enqueued
[2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081 encountered, 788825 discovered, 767288 enqueued
[2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876 encountered, 18186 discovered, 1 enqueued

user3453226 · Accepted Answer

Modern GCs delay collection of weakly-reachable objects.

Collections triggered by System.gc() always process weakly-reachable objects, and are concurrent by default, so you could implement a periodic task invoking the method.

high cpu usage on openjdk zgc

Answers (2)

Related Questions