Mark
Mark

Reputation: 443

How to deal with long Full Garbage Collection cycle in Java

We inherited a system which runs in production and started to fail every 10 hours recently. Basically, our internal software marks the system that is has failed if it is unresponsive for a minute. We found that our problem that our Full GC cycles last for 1.5 minutes, we use 30 GB heap. Now the problem is that we cannot optimize a lot in a short period of time and we cannot partition of our service quickly but we need to get rid of 1.5 minutes pauses as soon as possible as our system fails because of these pauses in production. For us, an acceptable delay is 20 milliseconds but not more. What will be the quickest way to tweak the system? Reduce the heap to trigger GCs frequently? Use System.gc() hints? Any other solutions? We use Java 8 default settings and we have more and more users - i.e. more and more objects created.

Some GC stat

enter image description here

Upvotes: 3

Views: 5372

Answers (3)

dan.m was user2321368
dan.m was user2321368

Reputation: 1705

There is no one-size-fits-all magic bullet solution to your problem: you'll need to have a good handle on your application's allocation and liveness patterns, and you'll need to know how that interacts with the specific garbage collection algorithm you are running (function of version of Java and command line flags passed to java).

Broadly speaking, a Full GC (that succeeds in reclaiming lots of space) means that lots of objects are surviving the minor collections (but aren't being leaked). Start by looking at the size of your Eden and Survivor spaces: if the Eden is too small, minor collections will run very frequently, and perhaps you aren't giving an object a chance to die before its tenuring threshold is reached. If the Survivors are too small, objects are going to be promoted into the Old gen prematurely.

GC tuning is a bit of an art: you run your app, study the results, tweak some parameters, and run it again. As such, you will need a benchmark version of your application, one which behaves as close as possible to the production one but which hopefully doesn't need 10 hours to cause a full GC.

As you stated that you are running Java 8 with the default settings, I believe that means that your Old collections are running with a Serial collector. You might see some very quick improvements by switching to a Parallel collector for the Old generation (-XX:+UseParallelOldGC). While this might reduce the 1.5 minute pause to some number of seconds (depending on the number of cores on your box, and the number of threads you specify for GC), this will not reduce your max pause to to 20ms.

Upvotes: 3

Peter Lawrey
Peter Lawrey

Reputation: 533492

You have a lot of retained data. There is a few options which are worth considering.

  • increase the heap to 32 GB, this has little impact if you have free memory. Looking again at your totals it appears you are using 32 GB rather than 30 GB, so this might not help.
  • if you don't have plenty of free memory, it is possible a small portion of your heap is being swapped as this can increase full GC times dramatically.
  • there might be some simple ways to make the data structures more compact. e.g. use compact strings, use primitives instead of wrappers e.g. long for a timestamp instead of Date or LocalDateTime. (long is about 1/8th the size)
  • if neither of these help, try moving some of the data off heap. e.g. Chronicle Map is a ConcurrentMap which uses off heap memory can can reduce you GC times dramatically. i.e. there is no GC overhead for data stored off heap. How easy this is to add highly depends on how your data is structured.

I suggest analysing how your data is structured to see if there is any easy ways to make it more efficient.

Upvotes: 3

user2743227
user2743227

Reputation:

When this happened to me, it was due to a memory leak caused by a static variable eating up memory. I would go through all recent code changes and look for any possible memory leaks.

Upvotes: 0

Related Questions