Reputation: 20237
I am working on a Hadoop project (currently using hadoop 1.2.1) where I need to keep track of task runtime information and which machines are performing tasks well. I am able to get task progress using the following:
RunningJob runningJob = JobClient.runJob(conf);
JobStatus jobStatus = runningJob.getJobStatus();
From here I can get a JobTracker and get map task progress:
TaskReport[] mapTaskReports = tracker.getMapTaskReports();
But now that I have the task reports, I am not sure how to know which machines these tasks are/were running on. Is there any machine identifying information that I can retrieve (machine name, ip address, etc.) and be able to related back to these task reports?
NOTE: I need to be able to do this mapping with a job is still in progress, so I can make decisions based on whether certain machines are preforming poorly for certain tasks.
EDIT: I think that the TaskTracker object may have what I want, with its getHostName() method, but I am not sure how to get an instance of it. The TaskTracker constructor takes in a JobConf object, but it doesn't seem to specify which machine it will get it from, as each machine running a task for the job will have its own instance of the TaskTracker.
Upvotes: 0
Views: 68
Reputation:
RunningJob has API called getTaskCompletionEvents(), which returns TaskCompletionEvent array.
Using
TaskCompletionEvent we can know HTTP address of Task Tracker.
Please try below code ..this is sample code..not tested
TaskCompletionEvent [] events = runningJob.getTaskCompletionEvents (0);
for (TaskCompletionEvent event: events) {
System.out.println(event.getTaskTrackerHttp()); // host:port format
}
Upvotes: 1