Matt Sheppard
Matt Sheppard

Reputation: 118013

Why is this Java process failing to terminate?

I have an intermittent problem on a build server where a Java process in the build somehow fails to terminate and seems to continue running (using 100% of the CPU) forever (I've seen it run for 2+ days over the weekend where it normally takes about 10 minutes). kill -9 pid seems to be the only way to stop the process.

I have tried calling kill -QUIT pid on the process, but it does not seem to produce any stack trace to STDOUT (maybe it's not responding to the signal?). jstack without the -F force option appears to be unable to connect to the running JVM, but with the force option it does produce the output included below.

Unfortunately, even with that stack trace I can't see any obvious path for further investigation.

As far as I can tell it shows two 'BLOCKED' threads which have run Object.wait (their stacks appear to contain only core Java code, nothing of ours) and a third which is 'IN_VM' with no stack output.

What steps should I take to gather more information about the cause of the problem (or better yet, how can I solve it)?

$ /opt/jdk1.6.0_29/bin/jstack -l -F 5546
Attaching to process ID 5546, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.4-b02
Deadlock Detection:

No deadlocks found.

Finding object size using Printezis bits and skipping over...
Thread 5555: (state = BLOCKED)

Locked ownable synchronizers:
    - None

Thread 5554: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=118 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=134 (Interpreted frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 5553: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 5548: (state = IN_VM)

Locked ownable synchronizers:
    - None

(Java version 1.6.0 update 29, running on Scientific Linux release 6.0)

Update:

Running strace -f -p 894 produces a seemingly endless stream of...

[pid   900] sched_yield()               = 0
[pid   900] sched_yield()               = 0
...

and then when Ctrl-Cd

Process 894 detached
...
Process 900 detached
...
Process 909 detached

jmap -histo 894 does not connection but jmap -F -histo 894 returns...

Attaching to process ID 894, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.4-b02
Iterating over heap. This may take a while...
Finding object size using Printezis bits and skipping over...
Finding object size using Printezis bits and skipping over...
Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:      11356   1551744 * MethodKlass
2:      11356   1435944 * ConstMethodKlass
3:      914 973488  * ConstantPoolKlass
4:      6717    849032  char[]
5:      16987   820072  * SymbolKlass
6:      2305    686048  byte[]
7:      914 672792  * InstanceKlassKlass
8:      857 650312  * ConstantPoolCacheKlass
9:      5243    167776  java.lang.String
10:     1046    108784  java.lang.Class
11:     1400    87576   short[]
12:     1556    84040   * System ObjArray
13:     1037    64584   int[]
14:     103 60152   * ObjArrayKlassKlass
15:     622 54736   java.lang.reflect.Method
16:     1102    49760   java.lang.Object[]
17:     937 37480   java.util.TreeMap$Entry
18:     332 27960   java.util.HashMap$Entry[]
19:     579 27792   java.nio.HeapByteBuffer
20:     578 27744   java.nio.HeapCharBuffer
21:     1021    24504   java.lang.StringBuilder
22:     1158    24176   java.lang.Class[]
23:     721 23072   java.util.HashMap$Entry
24:     434 20832   java.util.TreeMap
25:     689 18936   java.lang.String[]
26:     238 17440   java.lang.reflect.Method[]
27:     29  16800   * MethodDataKlass
28:     204 14688   java.lang.reflect.Field
29:     330 13200   java.util.LinkedHashMap$Entry
30:     264 12672   java.util.HashMap
...
585:        1   16  java.util.LinkedHashSet
586:        1   16  sun.rmi.runtime.NewThreadAction$2
587:        1   16  java.util.Hashtable$EmptyIterator
588:        1   16  java.util.Collections$EmptySet
Total :     79700   8894800
Heap traversal took 1.288 seconds.

Upvotes: 11

Views: 7432

Answers (7)

Brian Cain
Brian Cain

Reputation: 14619

Here are some tools which you can use to localize the part of the process consuming CPU:

  • perf / oprofile, especially opannotate -- great for seeing what the hell code is consuming cycles
  • strace, gstack / gdb (as mentioned by others)
  • systemtap is enormously powerful, but limited in some of the same ways as the ptrace based tools (if your problem doesn't involve a syscall, it's much less effective).

Upvotes: 0

bestsss
bestsss

Reputation: 12056

Take a snapshot while the process is running normally via jstack -F (-F has to be present it produces different snapshot than just jstack). The thread numbers are not Thread.id but system one. 5548 seems to be created prior to Finalizer and RefCounter (they are not the source of the issue), so it should be either a GC thread or some compiler one.

100% probably means some bug in monitor. Java (hotspot) monitors use very simple spin locking mechanism to ensure ownership.

And of course, attach a debugger - GDB to check where exactly the process has stuck.

Upvotes: 2

Gene Wu
Gene Wu

Reputation: 396

I am encountering similar issue, my JBOSS jvm get an infinite loop, eventually it get OutOfMemory, I can't kill the process but Kill -9. I suspect the memory issue in most cases.

Upvotes: 1

ninjalj
ninjalj

Reputation: 43688

You can always do a strace -f -p pid to see what the Java process is doing. From the look of it (you cannot get a jstack without -F, and Thread 5548 shows no call stack and is IN_VM), it looks like thread 5548 is taking too much to do something, or is possibly in some infinite loop.

Upvotes: 3

aishwarya
aishwarya

Reputation: 1986

I would suspect a memory issue. You may want to watch the process using jstat and take a heap dump using jmap around the time you need to kill the process. See if jstat indicates continuous GC. Also, you may want to check the system's health in general (open file descriptors, network etc). Memory would be the easiest, so I would strongly recommend starting with it.

Upvotes: 2

ewan.chalmers
ewan.chalmers

Reputation: 16235

Thread 5554 might indicate that you have a lot of Objects with finalize methods, and/or some problem with a finalize method. It might be worthwhile to look at that.

I wasn't familiar with jstack, but it looks like it outputs less information that the thread dumps I am more familiar with. Might be useful to try to get a thread dump: kill -QUIT java_pid. Note that the dump goes to stdout which might be to console or to log file depending on your setup.

If it's hard to figure out where stdout is being directed to, and assuming it is going to a file, you could use find by recent modification time to identify candidate files. This is suggested in a comment to this blog post:

you can run find[2] command at your root directory and find out what changed in the last x seconds. I've usually used find to help me access to all the logs that changed in the last 10 minutes eg : find /var/tomcat -mmin -3 -print (prints out all the files modified under /var/tomcat in hte last 3 minutes).

Note that if you are running your JVM with -Xrs, this means that the SIGQUIT signal handler will not be installed and you will not be able to use that means of requesting a thread dump.

Upvotes: 1

Peter Szanto
Peter Szanto

Reputation: 7722

this might be caused by an Out Of memory too. I would try two things :

  • Enable automatic heap dump on OutOfMemory by addingJVM parameters

    -XX:+HeapDumpOnOutOfMemoryError XX:HeapDumpPath=/tmp

  • Try to connect to your JVM with JConsole and see if there is any unusual pattern

Upvotes: 2

Related Questions