AWS EMR Spark- Cloudwatch

Question

I was running an application on AWS EMR-Spark. Here, is the spark-submit job;-

Arguments : spark-submit --deploy-mode cluster --class com.amazon.JavaSparkPi s3://spark-config-test/SWALiveOrderModelSpark-1.0.assembly.jar s3://spark-config-test/2017-08-08

So, AWS uses YARN for resource management. I had a couple of doubts around this while I was observing the cloudwatch metrics :-

1)

What does container allocated imply here? I am using 1 master & 3 slave/executor nodes (all 4 are 8 cores CPU).

2)

I changed my query to:-

spark-submit --deploy-mode cluster --executor-cores 4 --class com.amazon.JavaSparkPi s3://spark-config-test/SWALiveOrderModelSpark-1.0.assembly.jar s3://spark-config-test/2017-08-08

Here the number of cores running is 3. Should it not be 3(number of executors)*4(number of cores) = 12?

Sanchay · Accepted Answer

1) Container allocated here basically represents the number of spark executors. Spark executor-cores are more like `executor-tasks meaning that you could have your app configured to run one executor per physical cpu and still ask it to have 3 executor-cores per cpu (think hyper-threading).

What happens by default on EMR, when you don't specify the number of spark-executors, is that dynamic allocation is assumed and Spark will only ask from YARN what it thinks it needs in terms of resources. Tried setting explicitly the number of executors to 10 and the containers allocated went upto 6 (max partitions of data). Also, under the tab "Application history", you can get a detailed view of YARN/Spark executors.

2) "cores" here refer to EMR core nodes and are not the same as spark executor cores. Same for "task" that in the monitoring tab refer to EMR task nodes. That is consistent with my setup, as I have 3 EMR slave nodes.

AWS EMR Spark- Cloudwatch

Answers (1)

Related Questions