Spark Streaming Design for 1000+ topics

Question

I have to design a spark streaming application with below use case. I am looking for best possible approach for this.

I have application which pushing data into 1000+ different topics each has different purpose . Spark streaming will receive data from each topic and after processing it will write back to corresponding another topic.

Ex. 

Input Type 1 Topic  --> Spark Streaming --> Output Type 1 Topic 
Input Type 2 Topic  --> Spark Streaming --> Output Type 2 Topic 
Input Type 3 Topic  --> Spark Streaming --> Output Type 3 Topic 
.
.
.
Input Type N Topic  --> Spark Streaming --> Output Type N Topic  and so on.

I need to answer following questions.

Is it a good idea to launch 1000+ spark streaming application per topic basis ? Or I should have one streaming application for all topics as processing logic going to be same ?
If one streaming context , then how will I determine which RDD belongs to which Kafka topic , so that after processing I can write it back to its corresponding OUTPUT Topic?
Client may add/delete topic from Kafka , how do dynamically handle in Spark streaming ?
How do I restart job automatically on failure ?

Any other issue you guys see here ?

Highly appreicate your response.

Spark Streaming Design for 1000+ topics

Answers (1)

Related Questions