Saurabh Sagar
Saurabh Sagar

Reputation: 1

spark-submit configuration to handle event load of 60K events per second

I'm setting up spark Submit job to handle more than 60k events per seconds. What should be my batch internal and driver, executor, nodes, core setting to handle that load.

I have tried batch interval from 1 minute to 10 minutes. With executor memory from 4GB to 30G, with number of core 10 to 60.

spark-submit --conf "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" --master spark://masterURL:7077 --deploy-mode cluster --driver-memory 30g --num-executors 60 --executor-cores 10 --executor-memory 30g --conf "spark.scheduler.mode=FAIR" --class “MainClass” SampleJar.jar

The scheduled batch interval should get completed before next schedule

Upvotes: 0

Views: 71

Answers (1)

Saurabh Sagar
Saurabh Sagar

Reputation: 1

Eventually, after struggling with a bunch of different configuration and reading most all performance block ove memory tuning recommendation. I figure out a fix for this. This was almost straight forward in terms of implementation.

Problem was was kafka throughput was not matching streaming processing power. Problem got resolved by changing JavaInputDStream repartition to a higher number (num).
This will spin up more threads to process at spark cluster and spark will bring more parallel processing. Else streaming job will always be struck with the number of Kafka partitions.

Hope this will help resolve someone problem.

Upvotes: 0

Related Questions