Handing skewed processing time of events in a streaming application

Question

I have a streaming application (written over spark/storm/whatever does not matter). Kafka is used as a source of stream events. Now there are some events that take significantly larger resource (time, cpu etc) compared to other events.

There are application specific nuances in how these larger messages are dealt with in various framework. For example

spark streaming's batch would get blocked unless the event is processed.
storm might continue processing events till max unacked messages have been reached for a partition.

Since in kafka message acking is only possible by till some messageid par partition, but not at individual message level. Whenever these larger events would come, the application would come at a halt at some point of time. And this is done to solve the tradeoff of duplicate message processing (if application dies while processing these large messages how much work can you afford to redo, since all the message after the larger message would need to be replayed). Another issue would be lag alerting, since because of larger message being stuck even if I process message after larger message, committed offset would not move.

Based on this understanding i am reaching on conclusion that, kafka is more suited when all the messages are of similar processing time in a topic (atleast spark and storm only give option of tuning things at topic level but not at individual partition level).

Hence below are the options I have

Either my partitioning strategy should make sure all the message in one topic (partition level segregation would not work) need almost equal processing time.
Use a streaming source where acking at individual message id level is possible for example redis queue or rabitmq
Make the duplicate message handling very low cost (lets say by doing a lookup and checking if message has already been processed) and keep max unacked message limit to very high.

Are there other options to deal with these scenarios?

Handing skewed processing time of events in a streaming application

Answers (1)

Related Questions