Reputation: 2194
Sometimes at work I need to use our cluster to run something, but it is used up to 100%, because certain jobs scale up when there are available resources, and my job won't execute for a long time. Is is possible to limit the resources of a running app? Or should we somehow choose a different scheduling policy, if so, then which one?
We use Capacity Scheduler.
Upvotes: 0
Views: 722
Reputation: 1232
It depends on what your apps are, are you 100% coming from large queries (hive app) or from another let's say, spark app.
Spark can eat up the whole cluster easily, even doing almost nothing, that is why you need to define how many cpus to give to those apps, memory, driver memory, etc.
You accomplish that when you do the spark submit, e.g.
spark-submit --master yarn --deploy-mode cluster --queue {your yarn queue} {program name} --driver-cores 1 --driver-memory 1G --num-executors 2 --executor-cores 1 -executor-memory 2G
That will limit that application to use only those resources (plus a little overhead)
If you have a more complicated environment then you will need to limit by queue, for example, queue1=20% of the cluster with up to 20% only, the default is like queue1 can go up to 100% of the cluster if nobody is using it.
Ideally, you should have several queues with the right limits in place and be really careful with preemption.
Upvotes: 1