Havnar
Havnar

Reputation: 2638

Does oozie use Yarn containers

We are currently running a large amount of Oozie jobs in our cluster.

Many of those jobs use templates and have sub-workflows.

These jobs don't always contain large and heavy jobs, they mostly contain a small shell script.

The Hue job browser show lots and lots of Oozie steps.

We now sometimes feel that our cluster is getting overloaded by these jobs. This made me wonder, does every one of those Oozie jobs get a yarn container appointed to it?

If so this would mean that for a 2 min job we are effectively useing 2-10 times more resources than required.

Upvotes: 0

Views: 1888

Answers (2)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

Just see by yourself...

  • in the Hue Dashboard, click on any Workflow that has been executed, select the "Actions" tab, look at the "External ID" column => every job_000000_0000 refers to a YARN job
  • ...and when "External ID" points to a Sub-Workflow, then if you click, you will get its own YARN jobs
  • alternately you can use the command line with oozie job -info <wkf/sub-wkf exec id>

You can get more details in that post for instance.


A frequent issue with Shell or Java actions is that the "launcher" YARN job uses the default job settings defined by your Hadoop admin -- e.g. 1 GB of RAM for the AppMaster and 1.5 GB for the "launcher".
But typically your shell just requires a few MB of RAM (on top of what is used by Oozie to bootstrap the Action in a raw YARN container), and its AppMaster just requires the bare minimum to control the execution-- say, 512 MB each.

So you can reduce the footprint of your Oozie actions by setting some undocumented properties -- in practice, standard Hadoop props prefixed by oozie.launcher.
See for instance this post then that post.

PS: oozie.launcher.mapreduce.map.java.opts is relevant for a Java action (or a Pig action, a Sqoop action, etc.) and should stay consistent with the global RAM quota; but it's not relevant for a Shell action [unless you set a really goofy value, in which case it might affect the Oozie bootstrap process]

Upvotes: 1

Bector
Bector

Reputation: 1334

In your case Yes, all jobs will get container still if you are invoking MR through shell. Its not true that for each container YARN will provide unnecessary memory or resources.

Yarn provides exact or little more resources but it increases if Job requires more.

Upvotes: 0

Related Questions