Boolean
Boolean

Reputation: 14664

Hadoop - job statistics

I used hadoop to run map-reduce applications on our cluster. The jobs take around 10 hours to complete daily. I want to know the time taken for each job, and the time taken by the longest job etc..so, that I can optimize those jobs. Is there any plugin or script that does this?

Thank you
Bala

Upvotes: 3

Views: 5776

Answers (4)

Harel Ben Attia
Harel Ben Attia

Reputation: 494

I've written an open source, non intrusive tool called Hadoop Job Analyzer, which provides this functionality by aggregating the data according to user specified views and sending them to a metric backend for further analysis.

Harel

Upvotes: 0

jmventar
jmventar

Reputation: 697

The 3 webpages referenced above are very usefull:

localhost:50030/jobtracker.jsp localhost:50060/tasktracker.jsp localhost:50070/dfshealth.jsp

There also a plugin for hyperich-hq to measure the performance of the job tracker and task tracker. hyperic hq plugin

Upvotes: 0

Leonidas
Leonidas

Reputation: 2438

Take a look at http://:50030 or http://:50030/jobhistory.jsp (at the bottom.

There is a analysis for each Job/Task/Task-Part (Map, Sort, Reduce). Pretty handy. You could write your own logs - I just "wget" all the Analysis-Pages and put them through awk for a crude statistics.

Upvotes: 4

Binary Nerd
Binary Nerd

Reputation: 13937

First, have you been looking at the job tracker UI that comes with Hadoop to track the progress of jobs. You should check all the standard counter statistics each job produces as well as any custom counters you have added to a job.

An interesting alternative might be to take a look at Cloudera Desktop.

I also found this article from Cloudera useful: 7 tips for improving MapReduce performance

Out of interest, are you optimizing your jobs because they are taking too long?

Upvotes: 1

Related Questions