SSaikia_JtheRocker
SSaikia_JtheRocker

Reputation: 5063

How could I programmatically get all the job tracker and tasktracker information that is displayed by Hadoop in the web interface?

I'm using Cloudera's Hadoop distribution CDH-0.20.2CDH3u0. Is there any way I could the information such as jobtracker status, tasktracker status, counters using a JAVA program running outside of hadoop framework? I tried listening using JMX but hadoop provides very less information regarding Jobtracker, tasktracker and datanode. It doesn't provide any JMX attributes related to running job state like map percent completion, reduce percent completion, task percent completion, attempt percent completion, counters status etc.

Futhermore I tried using the metrics logs dumped by hadoop. But it too doesn't contain any information regarding map/reduce percent completion, task percent completion.

I think, there should be some alternative way to get all those stuffs.

Please do reply.

Upvotes: 2

Views: 5530

Answers (2)

Matt Tenenbaum
Matt Tenenbaum

Reputation: 1321

You can use the Hadoop API to access this information programmatically. In particular, instantiate JobClient with the suitable configuration for your cluster, and then you can use getJob on that instance to get a RunningJob. With that, you should be able to get to the detail you're looking for (following code is completely untested, but in the direction of the right idea I hope):

JobClient theJobClient = new JobClient(new InetSocketAddress("your.job.tracker", 8021), new Configuration());
RunningJob theJob = theJobClient.getJob("job_id_string"); // caution, deprecated
float mapProgress = theJob.mapProgress(); // similar for reduceProgress
// etc (see RunningJob)

You can also get the list of currently-running jobs with theJobClient.jobsToComplete, which returns an array of JobStatus, which should expose similar values (mapProgress, etc), and can provide the JobID instance you could use to get the RunningJob above (if you want to avoid the deprecated method).

Surely there are further options. Start with http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobClient.html for further details.

Upvotes: 8

Kapil D
Kapil D

Reputation: 2660

I am not sure if this is correct but you can try HUE. I think HUE gives information about jobs. Since its open source you can see how they access job tracker and name tracker.

Upvotes: 3

Related Questions