Reputation: 2638
We are currently running a large amount of Oozie jobs in our cluster.
Many of those jobs use templates and have sub-workflows.
These jobs don't always contain large and heavy jobs, they mostly contain a small shell script.
The Hue job browser show lots and lots of Oozie steps.
We now sometimes feel that our cluster is getting overloaded by these jobs. This made me wonder, does every one of those Oozie jobs get a yarn container appointed to it?
If so this would mean that for a 2 min job we are effectively useing 2-10 times more resources than required.
Upvotes: 0
Views: 1888
Reputation: 9067
Just see by yourself...
job_000000_0000
refers to a YARN joboozie job -info <wkf/sub-wkf exec id>
You can get more details in that post for instance.
So you can reduce the footprint of your Oozie actions by setting some undocumented properties -- in practice, standard Hadoop props prefixed by oozie.launcher.
See for instance this post then that post.
PS: oozie.launcher.mapreduce.map.java.opts
is relevant for a Java action (or a Pig action, a Sqoop action, etc.) and should stay consistent with the global RAM quota; but it's not relevant for a Shell action [unless you set a really goofy value, in which case it might affect the Oozie bootstrap process]
Upvotes: 1
Reputation: 1334
In your case Yes, all jobs will get container still if you are invoking MR through shell. Its not true that for each container YARN will provide unnecessary memory or resources.
Yarn provides exact or little more resources but it increases if Job requires more.
Upvotes: 0