How to related task's back to the machine they were run on in Hadoop

Question

I am working on a Hadoop project (currently using hadoop 1.2.1) where I need to keep track of task runtime information and which machines are performing tasks well. I am able to get task progress using the following:

RunningJob runningJob = JobClient.runJob(conf);
JobStatus jobStatus = runningJob.getJobStatus();

From here I can get a JobTracker and get map task progress:

TaskReport[] mapTaskReports = tracker.getMapTaskReports();

But now that I have the task reports, I am not sure how to know which machines these tasks are/were running on. Is there any machine identifying information that I can retrieve (machine name, ip address, etc.) and be able to related back to these task reports?

NOTE: I need to be able to do this mapping with a job is still in progress, so I can make decisions based on whether certain machines are preforming poorly for certain tasks.

EDIT: I think that the TaskTracker object may have what I want, with its getHostName() method, but I am not sure how to get an instance of it. The TaskTracker constructor takes in a JobConf object, but it doesn't seem to specify which machine it will get it from, as each machine running a task for the job will have its own instance of the TaskTracker.

user1261215 · Accepted Answer

RunningJob has API called getTaskCompletionEvents(), which returns TaskCompletionEvent array.

Using TaskCompletionEvent we can know HTTP address of Task Tracker.

Please try below code ..this is sample code..not tested

TaskCompletionEvent [] events =  runningJob.getTaskCompletionEvents (0); 
for (TaskCompletionEvent event: events) { 
        System.out.println(event.getTaskTrackerHttp()); // host:port format
}

How to related task's back to the machine they were run on in Hadoop

Answers (1)

Related Questions

How to related task&#39;s back to the machine they were run on in Hadoop

Answers (1)

Related Questions

How to related task's back to the machine they were run on in Hadoop