Bibzon
Bibzon

Reputation: 256

How to set Spark configuration to use TASK nodes on AWS?

I am not a spark configuration expert and I have an issue with the task nodes. My cluster in AWS has 1 Master, 3 Core and 5 Task nodes. I can see load only on the Master node and on the 3 Core nodes, and the Task nodes are doing nothing.

Instances:

My configuration:

    .set("spark.executors.cores", "5")\
    .set("spark.submit.deployMode", "cluster")\
    .set("spark.yarn.executor.memoryOverhead", "1024")\
    .set("spark.sql.shuffle.partitions","108")\
    .set("spark.default.parallelism", "108")\
    .set("spark.yarn.node-labels.enabled","true")\
    .set("spark.yarn.node-labels.am.default-node-label-expression", "CORE")\
    .set("spark.yarn.executor.nodeLabelExpression","TASK")\
    .set("spark.yarn.nodemanager.vmem-check-enabled", "false")\
    .set("spark.yarn.node-labels.configuration-type", 'distributed')\
    .set("spark.memory.fraction", "0.8")\
    .set("spark.memory.storageFraction", "0.2")\
    .set("maximizeResourceAllocation","true")\ 

Is there any option in the configuration to solve this issue?

Upvotes: 0

Views: 1066

Answers (1)

SnigJi
SnigJi

Reputation: 1410

I don’t think there is separate node level called TASK.

TASK is part of default node level. If you see in my cluster, I have 10 TASK node running but it is part of default partition. So remove the property .set("spark.yarn.executor.nodeLabelExpression","TASK")

YARN node levels

Also can you add this in your spark config:

spark.dynamicAllocation.enabled=true

Also I don’t think you need to specify these 2 property. In yarn-site.xml its already configured. .set("spark.yarn.node-labels.enabled","true")\ .set("spark.yarn.node-labels.am.default-node-label-expression", "CORE")

enter image description here

Upvotes: 1

Related Questions