Reputation: 5689
On a reasonably equipped 64-bit Fedora (home) server with 12-Cores
and 64gb-RAM
, I have Spark 2.4
running in Standalone
mode with the following configuration in ./spark-env.sh
(where not shown are the items in that file that I have left commented out):
# =====================================================================
# Options for the daemons used in the standalone deploy mode
# =====================================================================
export SPARK_MASTER_HOST=dstorm
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080 # JupyterLab uses port 8888.
# ---------------------------------------------------------------------
export SPARK_WORKER_CORES=3 # 12 # To Set number of worker cores to use on this machine.
export SPARK_WORKER_MEMORY=4g # Total RAM workers have to give executors (e.g. 2g)
export SPARK_WORKER_WEBUI_PORT=8081 # Default: 8081
export SPARK_WORKER_INSTANCES=4 # 5 # Number of workers on this server.
# ---------------------------------------------------------------------
export SPARK_DAEMON_MEMORY=1g # To allocate to MASTER, WORKER and HISTORY daemons themselves (Def: 1g).
# =====================================================================
# =====================================================================
# Generic options for the daemons used in the standalone deploy mode
# =====================================================================
export SPARK_PID_DIR=${SPARK_HOME}/pid # PID file location.
# =====================================================================
After starting the Spark MASTER and WORKERS under this configuration, I then start Jupyter with just two Notebook tabs that point to this Spark Standalone Cluster.
My issue is that just one Notebook tab's worth of cells -- by about the 5th or 6th cell -- consumes all Cores; leaving the second tab starved, stopping all progress in that second tab as it waits for (but never gets) resources. I can confirm this in the SparkUI: A RUNNING status for the first Notebook tab with all cores; and a WAITING status for the second tab with 0-Cores. This, despite that fact that the first Notebook has completed it's run (i.e. reached the bottom and completed its last cell).
By the way, this waiting is not restricted to Jupyter. If I next start Python/PySpark on the CLI and connect to the same cluster, it has to wait, too.
In all three cases I get a session
like this:
spark_sesn = SparkSession.builder.config(conf = spark_conf).getOrCreate()
Note that there is nothing heavy-duty going on in these notebook tabs or on the CLI. On the contrary, it's super light (just for testing).
Did I configure something wrong, or have my underlying distribution concept incorrect? I thought there should be multiplexing here, not blocking. Perhaps it's a session sharing issue? (i.e. .getOrCreate()
).
I've tried playing with the combination CORES + WORKER-INSTANCES (e.g. 12 x 5 respectively), but the same issue arises.
Hmmm. Well I will keep investigating (it's time for bed). =:)
Thank you in advance for your inputs and insights.
Upvotes: 0
Views: 1227
Reputation: 21
Have you started the shuffle service ? If no, you need to do it this way :
$SPARK_HOME_DIR/sbin/start-shuffle-service.sh
Then you need to enable the dynamicAllocation and specify to your sparkSession that the shuffle service is enabled.
To do so, declare it in your SparkConf() :
spark.dynamicAllocation.enabled = true
spark.shuffle.service.enabled = true
Look at : https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
Once you wait a "spark.dynamicAllocation.executorIdleTimeout" amount of time, the executors will be removed. You can see that on the Standalone Master UI and the Spark app UI.
Another good link :https://dzone.com/articles/spark-dynamic-allocation
Upvotes: 2