Running a distributed Spark Job Server with multiple workers in a Spark standalone cluster

Question

I have a Spark standalone cluster running on a few machines. All workers are using 2 cores and 4GB of memory. I can start a job server with ./server_start.sh --master spark://ip:7077 --deploy-mode cluster --conf spark.driver.cores=2 --conf spark.driver.memory=4g, but whenever I try to start a server with more than 2 cores, the driver's state gets stuck at "SUBMITTED" and no worker takes the job.

I tried starting the spark-shell on 4 cores with ./spark-shell --master spark://ip:7077 --conf spark.driver.cores=4 --conf spark.driver.memory=4g and the job gets shared between 2 workers (2 cores each). The spark-shell gets launched as an application and not a driver though.

Is there any way to run a driver split between multiple workers? Or can I run the job server as an application rather than a driver?

Daniel de Paula · Accepted Answer

The problem was resolved in the chat

You have to change your JobServer .conf file to set the master parameter to point to your cluster:

master = "spark://ip:7077"

Also, the memory that JobServer program uses can be set in the settings.sh file.

After setting these parameters, you can start JobServer with a simple call:

./server_start.sh

Then, once the service is running, you can create your context via REST, which will ask the cluster for resources and will receive an appropriate number of excecutors/cores:

curl -d "" '[hostname]:8090/contexts/cassandra-context?context-factory=spark.jobserver.context.CassandraContextFactory&num-cpu-cores=8&memory-per-node=2g'

Finally, every job sent via POST to JobServer on this created context will be able to use the executors allocated to the context and will be able to run in a distributed way.

Running a distributed Spark Job Server with multiple workers in a Spark standalone cluster

Answers (1)

Related Questions