Louis Q
Louis Q

Reputation: 176

Glassfish thread pool issues

We're using Glassfish 3.0.1 and experiencing very long response times; in the order of 5 minutes for 25% of our POST/PUT requests, by the time the response comes back the front facing load balancer has timed out.

My theory is that the requests are queuing up and waiting for an available thread.

The reason I think this is because the access logs reveal that the requests are taking a few seconds to complete however the time at which the requests are being executed are five minutes later than I'd expect.

Does anyone have any advice for debugging what is going on with the thread pools? or what the optimum settings should be for them?

Is it required to do a thread dump periodically or will a one off dump be sufficient?

Upvotes: 10

Views: 5652

Answers (3)

R.Moeller
R.Moeller

Reputation: 3446

Usually you get this behaviour if you configured not enough worker threads in your server. Default values range from 15 to 100 threads in common webservers. However if your application blocks the server's worker threads (e.g. by waiting for queries) the defaults are way too low frequently. You can increase the number of workers up to 1000 without problems (assure 64 bit). Also check the number of workerthreads (sometimes referred to as 'max concurrent/open requests') of any in-between server (e.g. a proxy or an apache forwarding via mod_proxy).

Another common pitfall is your software sending requests to itself (e.g. trying to reroute or forward a request) while blocking an incoming request.

Upvotes: 3

sky
sky

Reputation: 447

Taking threaddump is the best way to debug what is going on with the threadpools. Please take 3-4 threaddumps one after another with 1-2 seconds gap between each threaddump.

From threaddump, you can find the number of worker threads by their name. Find out long running threads from the multiple threaddumps.

You may use TDA tool (http://java.net/projects/tda/downloads/download/tda-bin-2.2.zip) for analyzing threaddumps.

Upvotes: 2

saarp
saarp

Reputation: 1951

At first glance, this seems to have very little to do with the threadpools themselves. Without knowing much about the rest of your network setup, here are some things I would check:

  • Is there a dead/nonresponsive node in the load balancer pool? This can cause all requests to be tried against this node until they fail due to timeout before being redirected to the other node.
  • Is there some issue with initial connections between the load balancer and the Glassfish server? This can be slow or incorrect DNS lookups (though the server should cache results), a missing proxy, or some other network-related problem.
  • Have you checked that the clocks are synchronized between the machines? This could cause the logs to get out of sync. 5min is a pretty strange timeout period.

If all these come up empty, you may simply have an impedance mismatch between the load balancer and the web server and you may need to add webservers to handle the load. The load balancer should be able to give you plenty of stats on the traffic coming in and how it's stacking up.

Upvotes: 6

Related Questions