user2963757
user2963757

Reputation: 107

2 akka streams with kafka in between best practices

I have a stream A which publish to a Kafka server and a stream B which is consuming from the Kafka service, processing and then publish to multiple Kafka topics. Stream A is producing with a rate of around 50 ms (publish to kafka included) and Stream B is processing and producing with a rate of 500 ms (so, 10 times slower). Due to this, even some records were publish now by stream A, it takes sometimes up to 5 minutes to be processed by stream B, when under high load (e.g. 50k records to be processed at once) which is not an alternative and close to unacceptable. My question is: what are the best practices for this scenario, in general, and what could be a quick approach to handle this? These streams are part of the same app. I know that maybe I only gave the big picture, but I'm looking for a starting point, any ideas are welcome.

Upvotes: 0

Views: 199

Answers (2)

posthumecaver
posthumecaver

Reputation: 1843

Ok, it sound to me for something that you need an Application Logic, you try to solve the problem with Technology.

If you can group the Events you produce under same key (for ex, you have Customer with Id: 111 and you send all CREATE, UPDATE, DELETE Events with same Key <-> Id: 111) then you can use a Topic with multiple partitions.

This way all the Event that are produced with same key, will land in same partition and would be guaranteed to be processed in order, in this way you can parallelise the consuming and processing, so with 10 partitions you can be as fast as producers.

If this is not possible, you have to use backpressure mechanisms of the Alpakka Kafka Streams and may be an Akka State Machine for the Application Logic part, which I explain in the following blog how it can be done.

Upvotes: 0

V-Lamp
V-Lamp

Reputation: 1678

The is no back-pressure mechanism for Kafka. If a downstream consumer is slower, the lag will grow.

The way to deal with this is to spin more instances of the consumers or make your consumer beefier (more CPU probably, but depends on what is the bottleneck).

It sounds like you have both upstream produces and downstream consumer in the same deployable. This is a bit questionable: Why not just let B consume directly from the source of A?

Upvotes: 1

Related Questions