Reputation: 5105
We have a Java EE application with a heap size of several gigabytes on our production servers. Every once in a while, any one of our servers no longer reacts to any requests.
How would you start diagnosing this?
Our JAVA_OPTS are the following:
-Xms8096m -Xmx8096m -XX:MaxPermSize=512M -Dsun.rmi.dgc.client.gcInterval=1800000 -Dsun.rmi.dgc.server.gcInterval=1800000 -XX:NewSize=150M -XX:+UseParNewGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/path/to/gc.log
$ java -version
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
$ uname -a
Linux myhostname 2.6.18-274.3.1.el5 #1 SMP Tue Sep 6 20:13:52 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/version
Linux version 2.6.18-274.3.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Tue Sep 6 20:13:52 EDT 2011
$ cat /etc/issue
CentOS release 5.7 (Final)
Kernel \r on an \m
$ cat /proc/meminfo|grep "MemTotal"
MemTotal: 16279356 kB
This is an example GC log snippet from when the problem occurred:
111036.554: [GC 111036.555: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111036.555: [Tenured: 3629252K->3647971K(5526912K), 8.7565190 secs] 5840068K->3647971K(8014016K), 8.7567840 secs]
111055.691: [GC 111055.691: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111055.691: [Tenured: 3647971K->3667529K(5526912K), 8.7876340 secs] 5858787K->3667529K(8014016K), 8.7878690 secs]
111071.037: [GC 111071.037: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111071.037: [Tenured: 3667529K->3692057K(5526912K), 8.7581830 secs] 5878345K->3692057K(8014016K), 8.7584210 secs]
111088.407: [GC 111088.407: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111088.407: [Tenured: 3692057K->3638194K(5526912K), 10.7072790 secs] 5902873K->3638194K(8014016K), 10.7074960 secs]
111110.238: [GC 111110.238: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111110.238: [Tenured: 3638194K->3654614K(5526912K), 8.8021440 secs] 5849010K->3654614K(8014016K), 8.8023860 secs]
111128.115: [GC 111128.115: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111128.115: [Tenured: 3654614K->3668670K(5526912K), 8.8451510 secs] 5865430K->3668670K(8014016K), 8.8453600 secs]
111161.684: [GC 111161.684: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111161.684: [Tenured: 3668670K->3684080K(5526912K), 8.8156740 secs] 5879486K->3684080K(8014016K), 8.8159260 secs]
111186.669: [GC 111186.669: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111186.669: [Tenured: 3684080K->3639333K(5526912K), 10.6025350 secs] 5894896K->3639333K(8014016K), 10.6030040 secs]
111208.692: [GC 111208.692: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111208.692: [Tenured: 3639333K->3657993K(5526912K), 8.7967920 secs] 5850149K->3657993K(8014016K), 8.7970090 secs]
111235.486: [GC 111235.487: [ParNew: 2210816K->2210816K(2487104K), 0.0000090 secs]111235.487: [Tenured: 3657993K->3676521K(5526912K), 8.8212340 secs] 5868809K->3676521K(8014016K), 8.8214930 secs]
Upvotes: 4
Views: 543
Reputation: 1856
It usually takes a bit of experimentation, but my guess is that you are seeing this problem every 30 minutes, right? When you have a full GC, like you have with those tenured lines, which does not seem to be caused by a full heap, they are very likely to be caused by System.gc(). Normally in the log it would be printed as (System), but because your VM is old, I am not sure. The GC log information changes every release. So to eliminate that, I strongly recommend to use "-XX:+DisableExplicitGC". It also ignores the call the DGC does.
The second option here might be that you have a memory leak/problem you failed to see in the dump. The lines say that you always have 2210816K in new, and old is still at 3676521K in the end. If all of the 2210816K are alive (what it looks like) then its impossible to move them to Tenured, because it will not fit (5887337K) so if it does not stop doing that you might get a GC Overhead exceeded message. But in that case you have to have those 6G live objects in heap when you took the dump
Upvotes: 0
Reputation: 533520
As you have a very old version of Java (almost four years old) and you appear to be getting into an error state, the first thing to try is a newer version such as Java 6 update 35. I suspect update 12 doesn't have Compressed Oops on by default which is an option which should save you some memory (and thus overhead)
Upvotes: 2
Reputation: 3515
First of all : a thread dump ! Kill -3 the process and check the log outputs.
You can confirm the GC by seeing the GC threads running.
You use 8Go for the JVM. How many gigs for the RAM ? (I hope at least 12).
Thread dump will show you the Eden, From, To, Permgen sizes. This may help to find which memory space is having a problem.
Upvotes: 1