Reputation: 1721
Instead of setting up the job driver configurations , such as Inputformat mapper class etc in the oozie work flow xml, I want o to call my ExampleDecision.java. ExampleDecision main method would take care of Job Set Up and configurations. How to do that ?
Would II Java-Main Action decision nodes illustration in OOZIE_COOK_BOOK
be equal to java ExampleDecision <strings...>
and submit the job as a normal java job
or
would it be equal to Hadoop jar SomeJar.ExampleDecision <strings...>
Upvotes: 0
Views: 612
Reputation: 3173
Yes, you can very well create a map/reduce code using Java in which the main class takes care of configuring the job and job dispatch as usual.
Then you may use the java
action tag in the Oozie workflow to invoke the Main class in the Jar.
Here the Main class will dispatch the map/reduce job, which will be identical to the approach where you use the map-reduce
tag.
The main consideration you have to assert is that you should only use the job.waitForCompletion(true)
statement in the Main class to dispatch the job.
The important reason for this is to hold the Oozie execution on the Java action node until the map/reduce job dispatched by the Main class gets completed.
Edit:-
The differences between calling hadoop jar
and java jar
will be
hadoop
command would arrange few env properties beforehand to the
mapreduce job execution like JAVA_HOME,HADOOP_HOME,HADOOP_OPTS etc.
Mostly you might have already defined in your environment variables
and this would not cause any problem while you execute using java
command.
While you use java
action to invoke mapreduce job using Oozie workflow, Oozie will not be able to collect statistics,counters regarding the dispatched mapreduce job since the actual mr job will be spawned from the container dispatched for the java action
.
So the Java action node
executed by the Oozie will be running in a separate container(MapTask), which is just the Driver class which prepares the job and waits until the job gets completed, by then Oozie workflow also would wait to get the java action MapTask
completed. You will be able to see the job id of the spawned mapreduce job from the Oozie -info
command.
Hope this helps.
Upvotes: 1