Reputation: 3637
I would like to measure the time taken for map and reduce when performing I/O (reading from HDFS) in Hadoop. I am using Yarn. Hadoop 2.6.0. What are the options for that?
Upvotes: 2
Views: 405
Reputation: 3688
If you need exact measurments - you could use btrace, add it as a javaagent to your tasks via mapreduce.{map,reduce}.java.opts
- and then write script which measures whatever you like. Sample of btrace scripts are here.
Also there is HTrace - that might also be helpful.
Upvotes: 1
Reputation: 4141
One rough estimation could be creating custom counters. For both mapper and reducer you could collect the timestamp when mapper(or reducer) starts processing and when it ends. From starting and ending timestamp, calculate and add it to custom counters, i.e mappers add to MAPPER_RUNNING_TIME
and reducers add to REDUCER_RUNNING_TIME
(or whatever name you would like to give it). When the execution is finished, subtract the aggregated value of your counters from MILLIS_MAPS
and MILLIS_REDUCES
respectively. You might need to look into Hadoop source code though to confirm if the staging time is or is not being included into MILLIS_MAPS
and MILLIS_REDUCES
. With this estimation you would need to take into account that the tasks are being executed concurrently, so the time will be rather total (or aggregated for all mappers and reducers).
I have not done this personally, but I think this solution could work unless you find better one.
Upvotes: 1