Reputation: 14664
I used hadoop to run map-reduce applications on our cluster. The jobs take around 10 hours to complete daily. I want to know the time taken for each job, and the time taken by the longest job etc..so, that I can optimize those jobs. Is there any plugin or script that does this?
Thank you
Bala
Upvotes: 3
Views: 5776
Reputation: 494
I've written an open source, non intrusive tool called Hadoop Job Analyzer, which provides this functionality by aggregating the data according to user specified views and sending them to a metric backend for further analysis.
Harel
Upvotes: 0
Reputation: 697
The 3 webpages referenced above are very usefull:
localhost:50030/jobtracker.jsp localhost:50060/tasktracker.jsp localhost:50070/dfshealth.jsp
There also a plugin for hyperich-hq to measure the performance of the job tracker and task tracker. hyperic hq plugin
Upvotes: 0
Reputation: 2438
Take a look at http://:50030 or http://:50030/jobhistory.jsp (at the bottom.
There is a analysis for each Job/Task/Task-Part (Map, Sort, Reduce). Pretty handy. You could write your own logs - I just "wget" all the Analysis-Pages and put them through awk for a crude statistics.
Upvotes: 4
Reputation: 13937
First, have you been looking at the job tracker UI that comes with Hadoop to track the progress of jobs. You should check all the standard counter statistics each job produces as well as any custom counters you have added to a job.
An interesting alternative might be to take a look at Cloudera Desktop.
I also found this article from Cloudera useful: 7 tips for improving MapReduce performance
Out of interest, are you optimizing your jobs because they are taking too long?
Upvotes: 1