Reputation: 26548
I'm deploying Spark in standalone mode. In the doc, I don't find how I can run external shuffle service in standalone mode. What command should I use? Or is it even supported?
Update
I found sbin/start-shuffle-serve.sh
in Spark 2.4, but not in Spark 3.1. In Spark 3.1, I only found a sbin/start-mesos-shuffle-service.sh
which seems to be related but not for standalone mode. Is it no longer supported?
Upvotes: 1
Views: 1148
Reputation: 18108
It is indeed supported, see https://books.japila.pl/apache-spark-internals/external-shuffle-service/ and the Spark manuals, e.g. https://spark.apache.org/docs/latest/job-scheduling.html. E,g:
Unfortunately hard to follow.
The following:
In standalone mode, start your workers with
spark.shuffle.service.enabled
set totrue
. I.e. spark.shuffle.service.enabled to true in thespark-defaults.conf
file for every Worker in your cluster. YARN approach handier I believe.Or do the following in code:
val sparkSession: SparkSession = SparkSession.builder().appName("SaveMode test")
.master("spark://localhost:7077")
.config("spark.shuffle.service.enabled", true)
.getOrCreate()
Start the external shuffle service with
./start-shuffle-service.sh
on your cluster.
Note there is also the spark.shuffle.service.db.enabled
parameter.
https://www.waitingforcode.com/apache-spark/external-shuffle-service-apache-spark/read is a good guide here, the manuals are a let down.
Upvotes: 2