Thiago Pereira
Thiago Pereira

Reputation: 1712

Akka + WithinTimeRange

I've testing the fault tolerant system of akka and so far it's been good when talking about retrying to send a msg according the maxNrOfRetries specified.

However, it does not restart the actor within the given time range, it restarts all at once, ignoring the within time range.

I tried with AllForOneStrategy and OneForOneStrategy but does not change anything.

Trying to follow this blog post: http://letitcrash.com/post/23532935686/watch-the-routees, this is the code I've been working.

class Supervisor extends Actor with ActorLogging {

  var replyTo: ActorRef = _

  val child = context.actorOf(
    Props(new Child)
      .withRouter(
        RoundRobinPool(
          nrOfInstances = 5,
          supervisorStrategy =
            AllForOneStrategy(maxNrOfRetries = 3, withinTimeRange = 10.second) {
              case _: NullPointerException     => Restart
              case _: Exception                => Escalate
            })), name = "child-router")

  child ! GetRoutees

  def receive = {
    case RouterRoutees(routees) =>
      routees foreach context.watch

    case "start" =>
      replyTo = sender()
      child ! "error"

    case Terminated(actor) =>
      replyTo ! -1
      context.stop(self)
  }
}

class Child extends Actor with ActorLogging {

  override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
    log.info("***** RESTARTING *****")
    message foreach{ self forward }
  }

  def receive = LoggingReceive {
    case "error" =>
      log.info("***** GOT ERROR *****")
      throw new NullPointerException
  }
}

object Boot extends App {

  val system = ActorSystem()
  val supervisor = system.actorOf(Props[Supervisor], "supervisor")

  supervisor ! "start"

}

Am I doing anything wrong to accomplish that?

EDIT

Actually, I misunderstood the purpose of the withinTimeRange. To schedule my retries in a time range, I'm doing the following:

override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
    log.info("***** RESTARTING *****")
    message foreach { msg =>
      context.system.scheduler.scheduleOnce(30.seconds, self, msg)
    }
  }

It seems to work ok.

Upvotes: 3

Views: 226

Answers (2)

ka4eli
ka4eli

Reputation: 5424

From docs:

maxNrOfRetries - the number of times a child actor is allowed to be restarted, negative value means no limit, if the limit is exceeded the child actor is stopped

withinTimeRange - duration of the time window for maxNrOfRetries, Duration.Inf means no window

Your code means that when any child fails with NullPointerException more than 3 times within 10 seconds it will not be restarted again. Because of AllForOneStrategy after first Routee fails all routees are restarted. And because you've overridden preRestart to resend failed message this situation repeats again until reaches 3 failures within 10 seconds(which is achieved in less than a second).

Upvotes: 2

cmbaxter
cmbaxter

Reputation: 35443

I think you have misunderstood the purpose of the withinTimeRange arg. That value is supposed to be used in conjunction with maxNrOfRetries to provide a window in which to support the limiting of the number of retries. For example, as you have specified, the implication is that the supervisor will no longer restart an individual child if that child needs to be restarted more than 3 times in 10 seconds.

Upvotes: 4

Related Questions