huahsin68
huahsin68

Reputation: 6989

Is there a practice to diagnose thread hung happened in WebSphere?

public void doFilter(ServletRequest request, ServletResponse response,
            FilterChain chain) throws IOException, ServletException {

        if ((request instanceof HttpServletRequest)
                && (response instanceof HttpServletResponse)) {
            HttpServletRequest httpServletRequest = (HttpServletRequest) request;
            HttpServletResponse httpServletResponse = (HttpServletResponse) response;

            if (isSessionControlRequiredForThisResource(httpServletRequest)) {

                if (isSessionInvalid(httpServletRequest)) {

                    String encodedURL = httpServletRequest.getContextPath() + this.timeoutPage;

                    try {
                        httpServletResponse.sendRedirect(encodedURL);
                    } catch (Exception e) {
                        logger.error("[Error happened in filter] : ", e.fillInStackTrace());
                    }

                    return;
                }
            }

            if (!httpServletRequest.getRequestURI().startsWith(httpServletRequest.getContextPath() + ResourceHandler.RESOURCE_IDENTIFIER)) {
                httpServletResponse.setHeader("Cache-Control", "no-cache, no-store, must-revalidate");
                httpServletResponse.setHeader("Pragma", "no-cache");                    
                httpServletResponse.setDateHeader("Expires", 0);
                }
            }
            chain.doFilter(request, response);
        }

The code shown above sometime could fail during the mission causing following error shown in SystemOut.log.

[8/26/13 8:38:39:873 MYT] 0000002c ThreadMonitor W WSVR0605W: Thread "WebContainer : 9" (00000037) has been active for 611221 milliseconds and may be hung. There is/are 7 thread(s) in total in the server that may be hung.

It wasn't easy to diagnose this error because this will always follow by a very long list of stack trace which is not belong to my application. And usually it could happened few times for a certain period of time (roughly 15 to 20 minutes) but the thread ID could be different.

I'm not able to simulate this in unit testing in UAT server and I wasn't sure what could be the root cause of this problem. It happens occasionally. Is there a pattern to capture this error? Does it happen after some other exception has occurred say DB connection has lost or perhaps some background process was running, say retrieving huge resultset in production server? I'm just trying to understand what circumstances could lead to this problem so that I can avoid this during my coding.

Upvotes: 2

Views: 4052

Answers (2)

Scott Heaberlin
Scott Heaberlin

Reputation: 3424

It wasn't easy to diagnose this error because this will always follow by a very long list of stack trace which is not belong to my application.

You'll probably want to share the stack trace here then - one associated with the ThreadMonitor hung thread message (WSVR0605W).

Building on Beryllium's answer on generating a thread dump, (kill -3 <pid> works fine), you can use the IBM Thread and Monitor Analyzer tool to view the generated thread dump file. The tool will show thread state - you'll want to look for Blocked threads whose name start with "Web Container:" and see if there are any clues with regards to monitors and other threads. In fact, I recommend running kill -3 <pid> once, waiting about 15-30 seconds, then running kill -3 <pid> again. The Thread and Monitor Analyzer will allow you to view a "diff" of the two to see what threads are truly hung vs. those that may just be running slowly. It'll also alert you to any heap exhaustion.

Upvotes: 1

Beryllium
Beryllium

Reputation: 12998

  • Determine PID of your WebSphere process, for example by using jps

$ jps
1039 java


  • Create a full thread dump by using jstack

$ jstack 1039

  • Or (if you are on a UNIX system):

$ kill -QUIT 1039

$ kill -3 1039


  • Identify the thread(s) which hang (you get the name from the warning in the log file)
  • Identify the line where the threads hang
    • Look for race conditions, concurrent modifications in non-concurrent objects/iterators etc.
  • Inspect deadlocks as well (they are possibly appended to the full thread dump)
    • An example with a deadlock is here (look for threads in state BLOCKED).

Related:

Upvotes: 4

Related Questions