nonoDa
nonoDa

Reputation: 453

Spark on YARN unexpected number of executors with Google Cloud Dataproc

I'm trying to submit a spark application on a cluster with the following specs on GCP Dataproc:

Following the guides i found on memory and executor tuning on YARN i derived the following values for the application parameters:

spark = SparkSession.builder \
    .appName("test") \
    .master("yarn")\
    .config('spark.submit.deployMode','client')\
    .config("spark.executor.instances", "3")\
    .config("spark.executor.memory","10g")\
    .config("spark.executor.cores","3")\
    .enableHiveSupport() \
    .getOrCreate()  

as far as spark.executor.memory is concerned i should be well into the limits since i've reserved 1gb RAM for OS and Hadoop Daemons, therefore consindering memory overhead my limit should be

max(384MB, .07 * spark.executor.memory)---> max(384MB, .07*14GB)=max(384mb,0,98GB)= approx 1GB

so 15-2GB=13GB and i specified 10GB just to be safe.
Available cores are 4-1=3 since as i just said 1 core is reserved.

I would expect to see in the application UI 3 executors but i only get 2, i also tried by specifing spark.executor.cores=2 instead of 2 witho no avail.

Am i missing something?

thanks

Upvotes: 2

Views: 701

Answers (1)

Dagang Wei
Dagang Wei

Reputation: 26458

Dataproc enables Spark dynamic allocation by default, so you need to set spark.dynamicAllocation.enabled=false to use static allocation.

Also note that YARN NodeManager doesn't get all of the worker node memory, a portion (~20%) of it is reserved for services including NodeManager itself. Check the YARN UI or config for the actual memory.

Upvotes: 1

Related Questions