Reputation: 3433
I am running a single node cluster and handling timeseries data.I have a set of MapReduce jobs run periodically(using Quartz crontrigger) from client application.For example,
job1 : runs every 10 min .priority VERY_HIGH
job2 : runs every hour (it takes input from the output of job1).priority HIGH
job3 : runs every day(it takes input from the output of job2).priority NORMAL
.....
Everything working fine. But sometimes, multiple jobs can be triggered simultaneously, for example at 00:00 am job1,job2,job3 will be triggered.Even though job priorities set, due to available map slots, these jobs found to be executed parallel. So some input data missed for low priority jobs.
Brief: I need to execute strictly in FIFO based on job priority.Means it should be restricted in such a way that only single job runs at a time. i.e, job1 finished, then job2 finished, job3 ..
I don't know how the hadoop schedulers can help me. Please advise.
Upvotes: 0
Views: 1028
Reputation: 1
I've been working on a new workflow engine called Soop. https://github.com/radixCSgeek/soop it is very lightweight and simple to setup and run using a cron-like syntax. You can specify the job dependencies (including virtual dependencies between jobs) and the DAG engine will make sure to execute them in the right order.
Upvotes: 0
Reputation: 918
Try changing these settings to 1:
mapred.tasktracker.map.tasks.maximum 1 mapred.tasktracker.reduce.tasks.maximum 1
If you will limit no of mapper and reducer to 1 then next job has to wait for next mapper to finish. If you look, it is not a good solution.
Using Oozie workflow engine would best suite your need.
Upvotes: 1