fledgling
fledgling

Reputation: 1051

Order of messages with Spark Executors

I have a spark streaming application which streams data from kafka. I rely heavily on the order of the messages and hence just have one partition created in the kafka topic.

I am deploying this job in a cluster mode.

My question is: Since I am executing this in the cluster mode, I can have more than one executor pick up tasks and will I lose the order of messages received from kafka in that case. If not, how does spark guarantee order?

Upvotes: 0

Views: 984

Answers (2)

Rahul Sharma
Rahul Sharma

Reputation: 5834

The distributed processing power wouldn't be there with single partition, so instead use multiple partitions and I would suggest to attach sequence number with every message, either counter or timestamp.
If you don't have timestamp within message then kafka streaming provide a way to extract message timestamp and you can use it to order events based on timestamp then run events based on sequence.

Refer answer on how to extract timestamp from kafka message.

Upvotes: 1

Sachin Thapa
Sachin Thapa

Reputation: 3709

To maintain order using single partition is the right choice, here are few other things you can try:

  1. Turn off speculative execution

spark.speculation - If set to "true", performs speculative execution of tasks. This means if one or more tasks are running slowly in a stage, they will be re-launched.

  1. Adjust your batch interval / sizes such that they can finish processing without any lag.

Cheers !

Upvotes: 0

Related Questions