Tom Sebastian
Tom Sebastian

Reputation: 3433

How to run MapReduce Jobs sequentially one after another

I am running a single node cluster and handling timeseries data.I have a set of MapReduce jobs run periodically(using Quartz crontrigger) from client application.For example,

job1 : runs every 10 min .priority VERY_HIGH
job2 : runs every hour (it takes input from the output of job1).priority HIGH
job3 : runs every day(it takes input from the output of job2).priority NORMAL

.....

Everything working fine. But sometimes, multiple jobs can be triggered simultaneously, for example at 00:00 am job1,job2,job3 will be triggered.Even though job priorities set, due to available map slots, these jobs found to be executed parallel. So some input data missed for low priority jobs.

Brief: I need to execute strictly in FIFO based on job priority.Means it should be restricted in such a way that only single job runs at a time. i.e, job1 finished, then job2 finished, job3 ..

I don't know how the hadoop schedulers can help me. Please advise.

Upvotes: 0

Views: 1028

Answers (2)

user3535019
user3535019

Reputation: 1

I've been working on a new workflow engine called Soop. https://github.com/radixCSgeek/soop it is very lightweight and simple to setup and run using a cron-like syntax. You can specify the job dependencies (including virtual dependencies between jobs) and the DAG engine will make sure to execute them in the right order.

Upvotes: 0

Badal Singh
Badal Singh

Reputation: 918

Try changing these settings to 1:

mapred.tasktracker.map.tasks.maximum 1 mapred.tasktracker.reduce.tasks.maximum 1

If you will limit no of mapper and reducer to 1 then next job has to wait for next mapper to finish. If you look, it is not a good solution.

Using Oozie workflow engine would best suite your need.

Upvotes: 1

Related Questions