Boolean
Boolean

Reputation: 14664

Pipeling hadoop map reduce jobs

I have five map reduce that I am running each separately. I want to pipeline them all together. So, output of one job goes to next job. Currently, I wrote shell script to execute them all. Is there a way to write this in java? Please provide an example.

Thanks

Upvotes: 1

Views: 3544

Answers (5)

Neha Kumari
Neha Kumari

Reputation: 787

For your use case, I think Oozie will be good. Oozie is a workflow scheduler in which you can write different actions(can be map-reduce, java, shell, etc) to perform some compute, transformation, enrichment, etc. For this case :

action A : i/p input o/p a

action B : i/p a o/p b

action C : i/p b o/p c(final output)

You can finally persist c in HDFS, and can decide to persist or delete intermediate outputs.

If you want to do the computation done by all three actions in a single one then you can use Cascading. You can understand better about Cascading by their official documentation, and you can also refer my blog on same : https://tech.flipkart.com/expressing-etl-workflows-via-cascading-192eb5e7d85d

Upvotes: 0

Jeff Hammerbacher
Jeff Hammerbacher

Reputation: 4236

You may find JobControl to be the simplest method for chaining these jobs together. For more complex workflows, I'd recommend checking out Oozie.

Upvotes: 3

davek
davek

Reputation: 22905

Another possibility is Cascading, which also provides an abstraction layer on top of Hadoop: itseems to provide a similar combination of working-closely-with-Hadoop-concepts yet letting-hadoop-do-the-M/R-heavy lifting that one gets using Oozie workflows calling Pig scripts.

Upvotes: 0

user656189
user656189

Reputation: 137

Oozie is the solution for you. You can submit map-reduce types of jobs, hive jobs, pig jobs, system commands etc through Oozie's action tags.

It even has a co-ordinator which acts as a cron for your workflow.

Upvotes: 1

Singleton
Singleton

Reputation: 71

Hi I had similar requirement One way to do this is

after submitting first job execute following

Job job1 = new Job( getConf() );
job.waitForCompletion( true );

and then check for status using

if(job.isSuccessful()){
    //start another job with different Mapper.
    //change config
    Job job2 = new Job( getConf() );
}

Upvotes: 2

Related Questions