Spark Kafka Structured Streaming: Issue - Concurrent update to the log. Multiple streaming jobs detected

Question

I am experimenting running structured streaming from kafka source and sinking them back to kafka topics.

In my current set up, I am scheduling two spark jobs via spark-submit.

Each job reads from it's own unique Kafka topic. But both of them write to a shared topic.

My current spark-defaults.conf includes:

spark.streaming.concurrentJobs 5
spark.scheduler.mode FAIR

When both jobs are scheduled independently, they work as expected. However, when I try to schedule them together, by submitting one after the other, the job submitted first stops responding with logs:

java.lang.AssertionError: assertion failed: Concurrent update to the log. Multiple streaming jobs detected for 10
    at scala.Predef$.assert(Predef.scala:170)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcV$sp(MicroBatchExecution.scala:339)

Are there some confs that I am missing? How do we schedule concurrent jobs writing to the same Kafka topic in Spark? Appreciate your thoughts.

Edit: writing to the same Kafka topic

Edit: Formatted Question Title

Spark Kafka Structured Streaming: Issue - Concurrent update to the log. Multiple streaming jobs detected

Answers (1)

Related Questions