Reputation: 6465
I have a hadoop cluster which runs Hadoop 2.6. I'd like to submit multiple jobs to it in parallel. I'd like to know whether I should simply submit multiple jobs and let the cluster handle the rest or I should write them as a yarn application. As a matter of fact I'm not very familiar with Yarn application development and know exactly know how it is different from a regular Hadoop application.
Upvotes: 0
Views: 770
Reputation: 934
You can run the MR jobs both by using the MR1 and YARN. YARN has nothing to do with job parallelism.
It is just a framework for running various kinds of jobs.
Use oozie workflows or shell scripts to run the jobs in parallel.
Upvotes: 1
Reputation: 794
You can define oozie workflow with mapreduce jobs being forked. Following is the example from apache oozie documentation for the same.
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<fork name="forking">
<path start="firstparalleljob"/>
<path start="secondparalleljob"/>
</fork>
<action name="firstparallejob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<job-xml>job1.xml</job-xml>
</map-reduce>
<ok to="joining"/>
<error to="kill"/>
</action>
<action name="secondparalleljob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<job-xml>job2.xml</job-xml>
</map-reduce>
<ok to="joining"/>
<error to="kill"/>
</action>
<join name="joining" to="nextaction"/>
...
</workflow-app>
Upvotes: 0