user2458922
user2458922

Reputation: 1721

Oozie to Trigger a MapReduce Main Class

Instead of setting up the job driver configurations , such as Inputformat mapper class etc in the oozie work flow xml, I want o to call my ExampleDecision.java. ExampleDecision main method would take care of Job Set Up and configurations. How to do that ?

Would II Java-Main Action decision nodes illustration in OOZIE_COOK_BOOK be equal to java ExampleDecision <strings...> and submit the job as a normal java job

or

would it be equal to Hadoop jar SomeJar.ExampleDecision <strings...>

Upvotes: 0

Views: 612

Answers (1)

suresiva
suresiva

Reputation: 3173

Yes, you can very well create a map/reduce code using Java in which the main class takes care of configuring the job and job dispatch as usual.

Then you may use the java action tag in the Oozie workflow to invoke the Main class in the Jar.

Here the Main class will dispatch the map/reduce job, which will be identical to the approach where you use the map-reduce tag.

The main consideration you have to assert is that you should only use the job.waitForCompletion(true) statement in the Main class to dispatch the job.

The important reason for this is to hold the Oozie execution on the Java action node until the map/reduce job dispatched by the Main class gets completed.

Edit:-

The differences between calling hadoop jar and java jar will be

  • hadoop command would arrange few env properties beforehand to the mapreduce job execution like JAVA_HOME,HADOOP_HOME,HADOOP_OPTS etc. Mostly you might have already defined in your environment variables and this would not cause any problem while you execute using java command.

  • While you use java action to invoke mapreduce job using Oozie workflow, Oozie will not be able to collect statistics,counters regarding the dispatched mapreduce job since the actual mr job will be spawned from the container dispatched for the java action.

So the Java action node executed by the Oozie will be running in a separate container(MapTask), which is just the Driver class which prepares the job and waits until the job gets completed, by then Oozie workflow also would wait to get the java action MapTask completed. You will be able to see the job id of the spawned mapreduce job from the Oozie -info command.

Hope this helps.

Upvotes: 1

Related Questions