Reputation: 116636
I have a distributed system that basically executes processes (not OS processes, just stuff that needs to be done). after a few unsuccesful tries (timeouts) it notifies a failure.
I want to continue trying to execute the process afterwards in the background and the question is: should i use a bigger timeout period? or an increasingly bigger timeout (getting bigger and bigger each try)
Upvotes: 1
Views: 1882
Reputation: 16162
I think the first option is better choice, because if you are going to have bigger and bigger on each try, then if your starting at 1 minute after about 1 hour of failure the next try maybe after 1 day..! 1-> 2, 2 -> 4, 4 -> 8, 8 -> 16..
I will go with the first approach and define a reasonable timeout.
Upvotes: 2
Reputation: 49290
It depends on the reason for the failure to do something on the first attempt.
If it is due to potential overload / temporary exhaustion of some resource, you might want to try some exponential back off strategy. The reason being, that continuous attempts to acquire that what you want could make things even worse and thus will probably never lead to success.
If you are basically waiting for something to happen or be available e.g. a port being open or a file being there ("polling" basically), you might just want to wait for fixed periods of time.
This is somewhat oversimplified, but may give some basic ideas. Just make sure that you thoroughly test whatever strategy (or combination thereof) you choose, to make sure that it (obviously) actually works and also does not worsen anything.
Upvotes: 5
Reputation: 50692
If there are many reasons why it would fail it might be an option to have a look at redesigning the processes to make them able to continue after something went wrong.
Upvotes: 2