Reputation: 277
I am attempting to implement a message processing pipeline using actors. The steps of the pipeline include functions such as reading, filtering, augmentation and, finally, storage into a database. Something similar to this: http://sujitpal.blogspot.nl/2013/12/akka-content-ingestion-pipeline-part-i.html
The issue is that the reading, filtering and augmentation steps are much faster than the storage step which results in having a congested store actor and an unreliable system.
I am considering the following option: have the store actor pull the processed and ready to store messages. Is this a good option? better suggestions?
Thank you
Upvotes: 2
Views: 828
Reputation: 22374
You may consider several options:
if order of messages doesn't matter - just execute every storage operation inside separate actor (or future). It will cause all data storage to be doing in parallel - I recommend to use separate thread pool for that. If some messages are amendments to others or participate in same transaction - you may create separate actors only for each messageId/transactionId to avoid pessimistic/optimistic lock problems (don't forget to kill such actors on transaction end or by timeout) .
use bounded mailboxes (back-pressure) - then you will block new messages from your input if older are still not processed (for example you may block the receiving thread til message will be acknowledged by last actor in the chain). It will move responsibility to source system. It's working pretty much good with JMS durables - messages are storing in reliable way on JMS-broker side til your system finally have them processed.
combine the previous two
Upvotes: 3
Reputation: 111
You could also use receive pipeline on actors:
class PipelinedActor extends Actor with ReceivePipeline {
// Increment
pipelineInner { case i: Int ⇒ Inner(i + 1) }
// Double
pipelineInner { case i: Int ⇒ Inner(i * 2) }
def receive: Receive = { case any ⇒ println(any) }
}
actor ! 5 // prints 12 = (5 + 1) * 2
http://doc.akka.io/docs/akka/2.4/contrib/receive-pipeline.html
It suits your needs the best as you have small pipelining tasks before/after processing of the message by actor. Also it is blocking code but that is fine for your case, I believe
Upvotes: 0
Reputation: 69
The new AKKA stream (still in beta) has back pressure. It's designed to solve this problem.
Upvotes: 0
Reputation: 617
I am using an approach similar to this: Akka Work Pulling Pattern (source code here: WorkPullingPattern.scala). It has the advantage that it works both locally & with Akka Cluster. Plus the whole approach is fully asynchronous, no blocking at all.
If your processed "objects" won't all fit into memory, or one of the steps is slow, it is an awesome solution. If you spawn N workers, then N "tasks" will be processed at one time. It might be a good idea to put the "steps" into BalancingPools also with parallelism N (or less).
I have no idea if your processing "pipeline" is sequential or not, but if it is, just a couple hours ago I have developed a type safe abstraction based on the above + Shapeless library. A glimpse at the code, before it was merged with WorkPullingPattern is here: Pipeline.
It takes any pipeline of functions (of properly matching signatures), spawns them in BalancingPools, creates Workers and links them to a master actor which can be used for scheduling the tasks.
Upvotes: 3