Reputation: 453
I'm trying to submit a spark application on a cluster with the following specs on GCP Dataproc:
Following the guides i found on memory and executor tuning on YARN i derived the following values for the application parameters:
spark = SparkSession.builder \
.appName("test") \
.master("yarn")\
.config('spark.submit.deployMode','client')\
.config("spark.executor.instances", "3")\
.config("spark.executor.memory","10g")\
.config("spark.executor.cores","3")\
.enableHiveSupport() \
.getOrCreate()
as far as spark.executor.memory
is concerned i should be well into the limits since i've reserved 1gb RAM for OS and Hadoop Daemons, therefore consindering memory overhead my limit should be
max(384MB, .07 * spark.executor.memory)---> max(384MB, .07*14GB)=max(384mb,0,98GB)= approx 1GB
so 15-2GB=13GB and i specified 10GB just to be safe.
Available cores are 4-1=3 since as i just said 1 core is reserved.
I would expect to see in the application UI 3 executors but i only get 2, i also tried by specifing spark.executor.cores=2
instead of 2 witho no avail.
Am i missing something?
thanks
Upvotes: 2
Views: 701
Reputation: 26458
Dataproc enables Spark dynamic allocation by default, so you need to set spark.dynamicAllocation.enabled=false
to use static allocation.
Also note that YARN NodeManager doesn't get all of the worker node memory, a portion (~20%) of it is reserved for services including NodeManager itself. Check the YARN UI or config for the actual memory.
Upvotes: 1