Reputation: 624
I am on CDH 5.7.0 and I could see a strange issue with spark 2 running on YARN cluster. Hereunder is my job submit command
spark2-submit --master yarn --deploy-mode cluster --conf "spark.executor.instances=8" --conf "spark.executor.cores=4" --conf "spark.executor.memory=8g" --conf "spark.driver.cores=4" --conf "spark.driver.memory=8g" --class com.learning.Trigger learning-1.0.jar
Even though I have limited the number of cluster resources my job can use, I could see the resource utilization is more than the allocated amount.
The job starts with basic memory consumption like 8G of memory and would eat us the whole cluster.
I do not have dynamic allocation set to true.
I am just triggering an INSERT OVERWRITE query on top of SparkSession
.
Any pointers would be very helpful.
Upvotes: 1
Views: 173
Reputation: 1058
I created Resource Pool in cluster and assigned some resource as
Min Resources : 4 Virtual Cores and 8 GB memory
Used these pool to assign a spark job to limit the usages of resource (VCores and memory).
e.g. spark2-submit --class org.apache.spark.SparkProgram.rt_app --master yarn --deploy-mode cluster --queue rt_pool_r1 /usr/local/abc/rt_app_2.11-1.0.jar
If anyone has better options to archive the same please let us know.
Upvotes: 0