Reputation: 1712
I've testing the fault tolerant system of akka and so far it's been good when talking about retrying to send a msg according the maxNrOfRetries specified.
However, it does not restart the actor within the given time range, it restarts all at once, ignoring the within time range.
I tried with AllForOneStrategy and OneForOneStrategy but does not change anything.
Trying to follow this blog post: http://letitcrash.com/post/23532935686/watch-the-routees, this is the code I've been working.
class Supervisor extends Actor with ActorLogging {
var replyTo: ActorRef = _
val child = context.actorOf(
Props(new Child)
.withRouter(
RoundRobinPool(
nrOfInstances = 5,
supervisorStrategy =
AllForOneStrategy(maxNrOfRetries = 3, withinTimeRange = 10.second) {
case _: NullPointerException => Restart
case _: Exception => Escalate
})), name = "child-router")
child ! GetRoutees
def receive = {
case RouterRoutees(routees) =>
routees foreach context.watch
case "start" =>
replyTo = sender()
child ! "error"
case Terminated(actor) =>
replyTo ! -1
context.stop(self)
}
}
class Child extends Actor with ActorLogging {
override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
log.info("***** RESTARTING *****")
message foreach{ self forward }
}
def receive = LoggingReceive {
case "error" =>
log.info("***** GOT ERROR *****")
throw new NullPointerException
}
}
object Boot extends App {
val system = ActorSystem()
val supervisor = system.actorOf(Props[Supervisor], "supervisor")
supervisor ! "start"
}
Am I doing anything wrong to accomplish that?
EDIT
Actually, I misunderstood the purpose of the withinTimeRange. To schedule my retries in a time range, I'm doing the following:
override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
log.info("***** RESTARTING *****")
message foreach { msg =>
context.system.scheduler.scheduleOnce(30.seconds, self, msg)
}
}
It seems to work ok.
Upvotes: 3
Views: 226
Reputation: 5424
From docs:
maxNrOfRetries - the number of times a child actor is allowed to be restarted, negative value means no limit, if the limit is exceeded the child actor is stopped
withinTimeRange - duration of the time window for maxNrOfRetries, Duration.Inf means no window
Your code means that when any child fails with NullPointerException
more than 3 times within 10 seconds it will not be restarted again. Because of AllForOneStrategy
after first Routee fails all routees are restarted. And because you've overridden preRestart
to resend failed message this situation repeats again until reaches 3 failures within 10 seconds(which is achieved in less than a second).
Upvotes: 2
Reputation: 35443
I think you have misunderstood the purpose of the withinTimeRange
arg. That value is supposed to be used in conjunction with maxNrOfRetries
to provide a window in which to support the limiting of the number of retries. For example, as you have specified, the implication is that the supervisor will no longer restart an individual child if that child needs to be restarted more than 3 times in 10 seconds.
Upvotes: 4