user882659
user882659

Reputation: 161

Java NIO Selector Hang (jdk1.6_20)

I'm using jdk1.6_20 on Linux 2.6. I am observing a behavior where the NIO Selector, after calling Selector.select(timeout), fails to wake-up within the timeout(timeout=5 sec). It returns much later, couple of seconds delay(2~10 seconds) . This seems to be happening frequently during initial couple of minutes of application start-up time and stabilizes later on. Since our server is heart-beating with the client, the selector failing to wake-up on time causes it miss heartbeat and the peer disconnecting us.

Any help appreciated. Thanks.

Upvotes: 1

Views: 1339

Answers (4)

user882659
user882659

Reputation: 161

hmm... actually the story doesnt stop there ..we are not using incremental cms ..hence during the concurrent phase it is not relinquishing the cpu ... we are having 2 application servers on the same host with 16 cores and each is having 4 Parallel CMS threads besides the application threads of which there are about roughtly 45 to 60. Hence chances of CPU starvation are the most likely especially since we see that every time the selector gets delayed it is 100~200 milliseconds immediately after the concurrent-mark phase..

Upvotes: 0

user207421
user207421

Reputation: 311023

fails to wake-up within the timeout(timeout=5 sec).

It's not supposed to 'wake-up within the timeout'. It is supposed to wakeup after the timeout expires. If you're supposed to send heartbeats within 5 seconds, a timeout of 5 seconds is too long. I would make it 2.5s in this case.

Upvotes: 0

irreputable
irreputable

Reputation: 45453

It doesn't matter what the timeout is, as soon as a client is connecting, the selector should wake up immediately. Therefore you have some more serious bugs.

Upvotes: 0

nfechner
nfechner

Reputation: 17535

From the Javadoc for Selector.select(long):

This method does not offer real-time guarantees: It schedules the timeout as if by invoking the Object.wait(long) method.

Since startup time for an application might put a lot of stress on a system, this may lead to wakeup-delays.

For a solution: Switch to Selector.selectNow() as a non-blocking operation and handle retries in your application code.

Upvotes: 2

Related Questions