Reputation: 47
Summary
I have a server that should be long-running, and that spawns a few background threads for IO. I'm trying to make sure that the background/IO threads don't go down, or that they'll be brought back up if they do go down.
Current Solution
Currently my main loop just checks the status of all background checks (pseudo-code below). I think there should be a better way.
while (!Thread.currentThread().isInterrupted()) {
maintainThreads();
doWork();
condition.await(30, TimeUnit.SECONDS);
}
My Attempt
I'm considering switching to a SingleThreadExecutor
, with a custom queue
that won't remove the Runnable
when it pulls the next task. The executor
would then manage the threads for me so I could take it out of my main loop.
I'm worried that having one executor for each thread will be a performance hit, and that there are simpler/better solutions that exist for this problem. I've also considered setting up shutdown hooks for each thread to have them just restart themselves.
Any help would be appreciated.
Upvotes: 4
Views: 1678
Reputation: 4282
For application program, process (not thread) recreating/restarting is the most reliable fail recovery method.
How the really mission critical systems handle the failure? By providing redundancy, heart-beat monitoring, fast handover, and so on.
Don't try to keep the already failed thread blindly. There are many causes that can wreak havoc our process and we (human) only know just a few of those causes.
If we FAIL FAST and restart the process, OS kernel ensure us clean initial state. So even if our program is not too much reliable the program will run and do the job in some amount of the time.
Upvotes: 1
Reputation: 329
An important part of maintaining persistent background threads is handling your exceptions correctly at the Thread level. When handling error conditions and especially exceptions in your top-level server/daemon code you need to keep in mind that some exceptions can't be handled! When such an exception is encountered you should quit immediately or try to clean as much as you can and then quit.
For example most exceptions of type Error shouldn't be handled. This includes java.lang.VirtualMachineError exceptions: InternalError, OutOfMemoryError, StackOverflowError, UnknownError etc. As the previous answer mentions, catching Throwable is a big No-No as many exceptions can't be recovered. Think about your failure strategies - when would failing makes sense, what can you do in this case (may be log an error, or display a message to the user).
Try to always properly handle InterruptedException as it gives you time to clean up and gracefully shut down your threads. Otherwise you are risking data corruption.
For more exception handling tips check my Exceptions Guidelines post.
Upvotes: 2
Reputation: 65006
The real gotcha here is what you mean by go down in "or that they'll be brought back up if they do go down."
There are only two ways that I know of that a thread can go down without the entire process itself exiting in java:
run()
method terminates, either via exception or finishing the run method normally (i.e., non-exceptionally).Thread.stop()
is called on your thread.Let's tackle (2) first - Thread.stop()
is deprecated and is a big no-no in any well-behaving application. You can pretty much assume it is not going to be called, because if it is called, your application is already badly broken. Restarting any thread at this point may have undefined effects since your application is an inconsistent state.
So then for (1), you just have to ensure that run()
doesn't terminate. It won't terminate normally because you've already set up an infinite loop. To stop it from terminating exceptionally, you can catch (Throwable t)
and just keep looping (after logging the error appropriately).
Of course, a catch (Throwable t)
without a subsequent rethrow is usually a code smell. It means you caught some time of unspecified error, and then decided to keep going anyways. The errors might range from the benign (e.g., a SockedClosedExcpetion
because a remote client disconnected) to the unrecoverable (e.g., an OutOfMemoryError
or something even worse). You should really ask yourself if you want this thread to continue in the face of any type of exception.
Your application could be an invalid state and may not be able to continue. One compromise would be to only catch subclasses of Exception
and terminate the application on Error
. A more conservative approach would be to terminate the application on any type of exception that you don't know how to handle (and treat it as a bug to be fixed).
Upvotes: 2