Prabhash
Prabhash

Reputation: 73

MrJob multi step job execution time

What is the best way to display execution time of a multi step map reduce job?

I tried to set a self variable in mapper init of step1 of of the job

    def mapper_init_timer(self):
      self.start= time.process_time()

But when I try to read this in reducer_final of Step2

def reducer_final_timmer(self):
    #self.start is None here
    MRJob.set_status(self,"total time")

I can't figure out why self veriable is lost between steps. And if that is by design then how can we calculate time of exection of a MrJob script that also gives correct result when run with -r hadoop.

Upvotes: 2

Views: 1020

Answers (1)

franklinsijo
franklinsijo

Reputation: 18270

A simplest way would be get the time before and after invoking the run() and finding their difference,

from datetime import datetime
import sys

if __name__ == '__main__':
    start_time = datetime.now()
    MRJobClass.run()
    end_time = datetime.now()
    elapsed_time = end_time - start_time
    sys.stderr.write(elapsed_time)

Upvotes: 1

Related Questions