Reputation: 118013
I have an intermittent problem on a build server where a Java process in the build somehow fails to terminate and seems to continue running (using 100% of the CPU) forever (I've seen it run for 2+ days over the weekend where it normally takes about 10 minutes). kill -9 pid
seems to be the only way to stop the process.
I have tried calling kill -QUIT pid
on the process, but it does not seem to produce any stack trace to STDOUT (maybe it's not responding to the signal?). jstack without the -F force option appears to be unable to connect to the running JVM, but with the force option it does produce the output included below.
Unfortunately, even with that stack trace I can't see any obvious path for further investigation.
As far as I can tell it shows two 'BLOCKED' threads which have run Object.wait (their stacks appear to contain only core Java code, nothing of ours) and a third which is 'IN_VM' with no stack output.
What steps should I take to gather more information about the cause of the problem (or better yet, how can I solve it)?
$ /opt/jdk1.6.0_29/bin/jstack -l -F 5546 Attaching to process ID 5546, please wait... Debugger attached successfully. Server compiler detected. JVM version is 20.4-b02 Deadlock Detection: No deadlocks found. Finding object size using Printezis bits and skipping over... Thread 5555: (state = BLOCKED) Locked ownable synchronizers: - None Thread 5554: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=118 (Interpreted frame) - java.lang.ref.ReferenceQueue.remove() @bci=2, line=134 (Interpreted frame) - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 (Interpreted frame) Locked ownable synchronizers: - None Thread 5553: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame) - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116 (Interpreted frame) Locked ownable synchronizers: - None Thread 5548: (state = IN_VM) Locked ownable synchronizers: - None
(Java version 1.6.0 update 29, running on Scientific Linux release 6.0)
Update:
Running strace -f -p 894
produces a seemingly endless stream of...
[pid 900] sched_yield() = 0
[pid 900] sched_yield() = 0
...
and then when Ctrl-Cd
Process 894 detached
...
Process 900 detached
...
Process 909 detached
jmap -histo 894
does not connection but jmap -F -histo 894
returns...
Attaching to process ID 894, please wait... Debugger attached successfully. Server compiler detected. JVM version is 20.4-b02 Iterating over heap. This may take a while... Finding object size using Printezis bits and skipping over... Finding object size using Printezis bits and skipping over... Object Histogram: num #instances #bytes Class description -------------------------------------------------------------------------- 1: 11356 1551744 * MethodKlass 2: 11356 1435944 * ConstMethodKlass 3: 914 973488 * ConstantPoolKlass 4: 6717 849032 char[] 5: 16987 820072 * SymbolKlass 6: 2305 686048 byte[] 7: 914 672792 * InstanceKlassKlass 8: 857 650312 * ConstantPoolCacheKlass 9: 5243 167776 java.lang.String 10: 1046 108784 java.lang.Class 11: 1400 87576 short[] 12: 1556 84040 * System ObjArray 13: 1037 64584 int[] 14: 103 60152 * ObjArrayKlassKlass 15: 622 54736 java.lang.reflect.Method 16: 1102 49760 java.lang.Object[] 17: 937 37480 java.util.TreeMap$Entry 18: 332 27960 java.util.HashMap$Entry[] 19: 579 27792 java.nio.HeapByteBuffer 20: 578 27744 java.nio.HeapCharBuffer 21: 1021 24504 java.lang.StringBuilder 22: 1158 24176 java.lang.Class[] 23: 721 23072 java.util.HashMap$Entry 24: 434 20832 java.util.TreeMap 25: 689 18936 java.lang.String[] 26: 238 17440 java.lang.reflect.Method[] 27: 29 16800 * MethodDataKlass 28: 204 14688 java.lang.reflect.Field 29: 330 13200 java.util.LinkedHashMap$Entry 30: 264 12672 java.util.HashMap ... 585: 1 16 java.util.LinkedHashSet 586: 1 16 sun.rmi.runtime.NewThreadAction$2 587: 1 16 java.util.Hashtable$EmptyIterator 588: 1 16 java.util.Collections$EmptySet Total : 79700 8894800 Heap traversal took 1.288 seconds.
Upvotes: 11
Views: 7432
Reputation: 14619
Here are some tools which you can use to localize the part of the process consuming CPU:
perf
/ oprofile
, especially opannotate
-- great for seeing what the hell code is consuming cyclesstrace
, gstack
/ gdb
(as mentioned by others)systemtap
is enormously powerful, but limited in some of the same ways as the ptrace
based tools (if your problem doesn't involve a syscall, it's much less effective).Upvotes: 0
Reputation: 12056
Take a snapshot while the process is running normally via jstack -F (-F has to be present it produces different snapshot than just jstack). The thread numbers are not Thread.id but system one. 5548 seems to be created prior to Finalizer and RefCounter (they are not the source of the issue), so it should be either a GC thread or some compiler one.
100% probably means some bug in monitor. Java (hotspot) monitors use very simple spin locking mechanism to ensure ownership.
And of course, attach a debugger - GDB to check where exactly the process has stuck.
Upvotes: 2
Reputation: 396
I am encountering similar issue, my JBOSS jvm get an infinite loop, eventually it get OutOfMemory, I can't kill the process but Kill -9. I suspect the memory issue in most cases.
Upvotes: 1
Reputation: 43688
You can always do a strace -f -p pid
to see what the Java process is doing. From the look of it (you cannot get a jstack
without -F
, and Thread 5548 shows no call stack and is IN_VM), it looks like thread 5548 is taking too much to do something, or is possibly in some infinite loop.
Upvotes: 3
Reputation: 1986
I would suspect a memory issue. You may want to watch the process using jstat and take a heap dump using jmap around the time you need to kill the process. See if jstat indicates continuous GC. Also, you may want to check the system's health in general (open file descriptors, network etc). Memory would be the easiest, so I would strongly recommend starting with it.
Upvotes: 2
Reputation: 16235
Thread 5554 might indicate that you have a lot of Objects with finalize methods, and/or some problem with a finalize method. It might be worthwhile to look at that.
I wasn't familiar with jstack, but it looks like it outputs less information that the thread dumps I am more familiar with. Might be useful to try to get a thread dump: kill -QUIT java_pid
. Note that the dump goes to stdout which might be to console or to log file depending on your setup.
If it's hard to figure out where stdout is being directed to, and assuming it is going to a file, you could use find
by recent modification time to identify candidate files. This is suggested in a comment to this blog post:
you can run find[2] command at your root directory and find out what changed in the last x seconds. I've usually used find to help me access to all the logs that changed in the last 10 minutes eg : find /var/tomcat -mmin -3 -print (prints out all the files modified under /var/tomcat in hte last 3 minutes).
Note that if you are running your JVM with -Xrs
, this means that the SIGQUIT
signal handler will not be installed and you will not be able to use that means of requesting a thread dump.
Upvotes: 1
Reputation: 7722
this might be caused by an Out Of memory too. I would try two things :
Enable automatic heap dump on OutOfMemory by addingJVM parameters
-XX:+HeapDumpOnOutOfMemoryError XX:HeapDumpPath=/tmp
Try to connect to your JVM with JConsole and see if there is any unusual pattern
Upvotes: 2