Reputation: 1098
I am running a standalone Spark cluster and submitting my applications (written in SparkR) using spark-submit
in client mode. I have a set of applications that I have to run according to the user's input, so I can't keep them running. Each time, to submit an application and start processing data, it takes 15-20 seconds.
Can this time be reduced in any way? I have read about having a webserver on the driver machine, but not sure how that can be done. Also, I am not using any cluster manager (like YARN), just a standalone cluster.
Also, do resources on the client or the cluster such as CPU cores and memory affect this startup time?
Upvotes: 3
Views: 2404
Reputation: 1801
Using a Spark job-server to share SparkContexts across applications could help you shave off start-up time. (I am not sure if you need this since your start-up time of ~20s seems quite low.)
The popular Spark job-servers which provide context-sharing are:
Also, do resources on the client or the cluster such as CPU cores and memory affect this startup time?
Not really. The resources available should only affect the execution times of your application.
Upvotes: 1