Reputation: 256
I am not a spark configuration expert and I have an issue with the task nodes. My cluster in AWS has 1 Master, 3 Core and 5 Task nodes. I can see load only on the Master node and on the 3 Core nodes, and the Task nodes are doing nothing.
Instances:
My configuration:
.set("spark.executors.cores", "5")\
.set("spark.submit.deployMode", "cluster")\
.set("spark.yarn.executor.memoryOverhead", "1024")\
.set("spark.sql.shuffle.partitions","108")\
.set("spark.default.parallelism", "108")\
.set("spark.yarn.node-labels.enabled","true")\
.set("spark.yarn.node-labels.am.default-node-label-expression", "CORE")\
.set("spark.yarn.executor.nodeLabelExpression","TASK")\
.set("spark.yarn.nodemanager.vmem-check-enabled", "false")\
.set("spark.yarn.node-labels.configuration-type", 'distributed')\
.set("spark.memory.fraction", "0.8")\
.set("spark.memory.storageFraction", "0.2")\
.set("maximizeResourceAllocation","true")\
Is there any option in the configuration to solve this issue?
Upvotes: 0
Views: 1066
Reputation: 1410
I don’t think there is separate node level called TASK
.
TASK is part of default node level. If you see in my cluster, I have 10 TASK node running but it is part of default partition. So remove the property .set("spark.yarn.executor.nodeLabelExpression","TASK")
Also can you add this in your spark config:
spark.dynamicAllocation.enabled=true
Also I don’t think you need to specify these 2 property.
In yarn-site.xml its already configured.
.set("spark.yarn.node-labels.enabled","true")\
.set("spark.yarn.node-labels.am.default-node-label-expression", "CORE")
Upvotes: 1