New Contributer
New Contributer

Reputation: 509

Elapsed Time for a Hadoop Task

I have a cluster running YARN on it. It has 3 datanodes and 1 client node. I submit all my jobs on the client node. How can I get the elapsed time for all the tasks in a particular job.

Probably RESTful API (https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html) can be used for this purpose. But I am curious to know whether there is any Java API to do the same.

I am able to find the start time for all the task using the method getStartTime() of the TaskReport class. Although the nodes in clusters have times synced using NTP, I don't think it would be a good practice to use the client system current time (System.currentTimeMillis()) to calculate the elapsed time for the Running tasks there can be some accepted lag associated with all the nodes in a cluster even in NTP.

Upvotes: 1

Views: 1846

Answers (1)

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

In the Job class there is a method called #getTaskReports.

You could use it that way to retrieve the map task duration:

Job job = ...;
job.waitForCompletion(); 

TaskReport[] reports = job.getTaskReports(TaskType.MAP);
for(TaskReport report : reports) { 
   long time = report.getFinishTime() - report.getStartTime();
   System.out.println(report.getTaskId() + " took " + time + " millis!");
}

Upvotes: 1

Related Questions