Reputation: 400
I am integrating the use of Akka actors and Spark in the following way: when a task is distributed among the Spark nodes, while processing that tasks, each node also periodically sends metrics data to a different collector process that sits somewhere else on the network through the use of an Akka actor (connecting to the remote process through akka-remote).
The actor-based metrics sending/receiving functionality works just fine when used in standalone mode, but when integrated in a Spark task the following error is thrown:
java.lang.IllegalStateException: Trying to deserialize a serialized ActorRef without an ActorSystem in scope. Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'
at akka.actor.SerializedActorRef.readResolve(ActorRef.scala:407) ~[akka-actor_2.10-2.3.11.jar:na]
If I understood it correctly, the source of the problem is the Spark node being unable to deserialize the ActorRef because it does not have the full information required to do it. I understand that putting an ActorSystem in scope would fix it, but I am not sure how to use the suggested akka.serialization.Serialization.currentSystem.withValue(system) { ... }
The Akka official docs are very good in pretty much all topics they cover. Unfortunately, the chapter devoted to Serialization could be improved IMHO.
Note: there is a similar SO question here but the accepted solution is too specific and thus not really useful in the general case
Upvotes: 4
Views: 2154
Reputation: 17933
An ActorSystem
is responsible for all of the functionality involved with ActorRef
objects.
When you program something like
actorRef ! message
You're actually invoking a bunch of work within the ActorSystem, not the ActorRef, to put the message in the right mailbox, tee-up the Actor to run the receive method within the thread pool, etc... From the documentation:
An actor system manages the resources it is configured to use in order to run the actors which it contains. There may be millions of actors within one such system, after all the mantra is to view them as abundant and they weigh in at an overhead of only roughly 300 bytes per instance. Naturally, the exact order in which messages are processed in large systems is not controllable by the application author
That is why your code works fine "standalone", but not in Spark. Each of your Spark nodes is missing the ActorSystem machinery, therefore even if you could de-serialize the ActorRef in a node there would be no ActorSystem to process the !
in your node function.
You can establish an ActorSystem within each node and use (i) remoting to send messages to your ActorRef in the "master" ActorSystem via actorSelection
or (ii) the serialization method you mentioned where each node's ActorSystem would be the system
in the example you quoted.
Upvotes: 2