lehn0058
lehn0058

Reputation: 20237

How to related task's back to the machine they were run on in Hadoop

I am working on a Hadoop project (currently using hadoop 1.2.1) where I need to keep track of task runtime information and which machines are performing tasks well. I am able to get task progress using the following:

RunningJob runningJob = JobClient.runJob(conf);
JobStatus jobStatus = runningJob.getJobStatus();

From here I can get a JobTracker and get map task progress:

TaskReport[] mapTaskReports = tracker.getMapTaskReports();

But now that I have the task reports, I am not sure how to know which machines these tasks are/were running on. Is there any machine identifying information that I can retrieve (machine name, ip address, etc.) and be able to related back to these task reports?

NOTE: I need to be able to do this mapping with a job is still in progress, so I can make decisions based on whether certain machines are preforming poorly for certain tasks.

EDIT: I think that the TaskTracker object may have what I want, with its getHostName() method, but I am not sure how to get an instance of it. The TaskTracker constructor takes in a JobConf object, but it doesn't seem to specify which machine it will get it from, as each machine running a task for the job will have its own instance of the TaskTracker.

Upvotes: 0

Views: 68

Answers (1)

user1261215
user1261215

Reputation:

RunningJob has API called getTaskCompletionEvents(), which returns TaskCompletionEvent array.

Using TaskCompletionEvent we can know HTTP address of Task Tracker.

Please try below code ..this is sample code..not tested

TaskCompletionEvent [] events =  runningJob.getTaskCompletionEvents (0); 
for (TaskCompletionEvent event: events) { 
        System.out.println(event.getTaskTrackerHttp()); // host:port format
}

Upvotes: 1

Related Questions