Reputation: 73
What is the best way to display execution time of a multi step map reduce job?
I tried to set a self variable in mapper init of step1 of of the job
def mapper_init_timer(self):
self.start= time.process_time()
But when I try to read this in reducer_final of Step2
def reducer_final_timmer(self):
#self.start is None here
MRJob.set_status(self,"total time")
I can't figure out why self veriable is lost between steps. And if that is by design then how can we calculate time of exection of a MrJob script that also gives correct result when run with -r hadoop.
Upvotes: 2
Views: 1020
Reputation: 18270
A simplest way would be get the time before and after invoking the run()
and finding their difference,
from datetime import datetime
import sys
if __name__ == '__main__':
start_time = datetime.now()
MRJobClass.run()
end_time = datetime.now()
elapsed_time = end_time - start_time
sys.stderr.write(elapsed_time)
Upvotes: 1