Dagang Wei
Dagang Wei

Reputation: 26548

Run external shuffle service in Spark standalone mode

I'm deploying Spark in standalone mode. In the doc, I don't find how I can run external shuffle service in standalone mode. What command should I use? Or is it even supported?

Update

I found sbin/start-shuffle-serve.sh in Spark 2.4, but not in Spark 3.1. In Spark 3.1, I only found a sbin/start-mesos-shuffle-service.sh which seems to be related but not for standalone mode. Is it no longer supported?

Upvotes: 1

Views: 1148

Answers (1)

Ged
Ged

Reputation: 18108

It is indeed supported, see https://books.japila.pl/apache-spark-internals/external-shuffle-service/ and the Spark manuals, e.g. https://spark.apache.org/docs/latest/job-scheduling.html. E,g:

Unfortunately hard to follow.

The following:

In standalone mode, start your workers with spark.shuffle.service.enabled set to true. I.e. spark.shuffle.service.enabled to true in the spark-defaults.conf file for every Worker in your cluster. YARN approach handier I believe.

Or do the following in code:

 val sparkSession: SparkSession = SparkSession.builder().appName("SaveMode test")
                  .master("spark://localhost:7077")                                                              
                  .config("spark.shuffle.service.enabled", true)
                  .getOrCreate()

Start the external shuffle service with ./start-shuffle-service.sh on your cluster.

Note there is also the spark.shuffle.service.db.enabled parameter.

https://www.waitingforcode.com/apache-spark/external-shuffle-service-apache-spark/read is a good guide here, the manuals are a let down.

Upvotes: 2

Related Questions