ankkitgupta
ankkitgupta

Reputation: 46

akka.net actor parallel execution

We are doing POC on akka.net to process json files. I am struggling with the optimum approach to batch process JArray. In my implementation akka coordinator actor receives following message:

//coordinator actor receive
public class ValidatedInput
{
public JArray Data { get; set; }
}

My coordinator actor can process complete JArray in a single go like below but I am struggling to start number of parallel actors and each would process 50 records from JArray.

//coordinator actor receives messages and calls transform actor to process
public void Receiving()
{
Receive<ValidatedInput>(x =>
{
TransformerRouter.Tell(x);
});
}

//transform actor receives message and process, sample code
Receive<ValidatedInput>(x =>
{
PipeToSupport.PipeTo<TransformResult>(MapDataAsync(x).ContinueWith(data =>
{
return new TransformResult();}), Self);
});

Is there any way like below with which I can pass 50 JArray records to be processed by each actor and collect the result, something like:

Receive<ValidatedInputDataResult>(
{
TransformerRouter.Tell(x.Data.Take(50);
});

Upvotes: 2

Views: 863

Answers (1)

easuter
easuter

Reputation: 1197

Haven't used Akka.NET in a while, but when I did I always avoided passing around collections where possible, for two main reasons:

  • There is a limit on the message size that you can send to actors, and although this limit can be increased this isn't recommended.

  • All messages sent to actors are serialized and then deserialized when Receive<>'d, which means that if you're sending arrays or other collections of objects in a message, you risk having them allocated on the Large Object Heap every time you use the Tell method, which is something you should avoid as much as possible if this is a hot code path.

The way I tackled this type of problem at the time was:

  • Have a "top level" coordinator actor that:
    • contains a router behind which the worker actors are. You can for example configure the router to distribute messages in a round-robin fashion.
    • spawns an "aggregator" actor every time a new message was Receive'd, which the worker actors will send their results to. You can use the Tell method and pass the aggregator's actor reference, so that workers see the aggregator as the Sender in their actor context.
  • The router in the "top level" actor was configured to automatically spawn more actors when needed
  • The worker actors did nothing more than Receive a single message, process it and then Tell it to the Sender in its context.

Please keep in mind that this advice may be incomplete as I was not very "fluent" with using actor systems at the time, and I haven't actively used Akka.NET for about 6 months and there will likely be a nicer way to accomplish what you need.

I'd suggest searching on Google for "actor system patterns" and "Scala actor patterns", and reading through some open source Scala project source code, which will also give you some insight.

Lastly, a tip to avoid future headaches: message types should always be immutable. So your ValidatedInput should looks something like this:

public class ValidatedInput
{
    public readonly JArray Data { get; }

    public ValidatedInput(JArray data)
    {
        Data = data;
    }
}

Or better yet:

public class ValidatedInput
{
    public readonly IReadOnlyList<JToken> Data { get; }

    public ValidatedInput(IReadOnlyList<JToken> data)
    {
        Data = data;
    }
}

Hope this helps, and best of luck!

Upvotes: 4

Related Questions