Jayant Jadhav
Jayant Jadhav

Reputation: 318

Hadoop Mapper Context object

How is the run() method of mapper or reducer class called by the Hadoop framework? The framework is calling the run() method, but it requires one context object so how is Hadoop passing that object? What information resides in that object?

Upvotes: 5

Views: 8572

Answers (2)

Niranjan Sarvi
Niranjan Sarvi

Reputation: 899

The run() method will be called using the Java Run Time Polymorphism (i.e method overriding). As you can see the line# 569 on the link below, extended mapper/reducer will get instantiated using the Java Reflection APIs. The MapTask class gets the name of extended mapper/reducer from the Job configuration object which the client program would have been configured extended mapper/reducer class using job.setMapperClass()

The following is the code taken from the Hadoop Source MapTask.java

mapperContext = contextConstructor.newInstance(mapper, job, getTaskID(),
                                                  input, output, committer,
                                                  reporter, split);

   input.initialize(split, mapperContext);
   mapper.run(mapperContext);
   input.close();` 

The line# 621 is an example of run time polymorphism. On this line, the MapTask calls the run() method of configured mapper with 'Mapper Context' as parameter. If the run() is not extended, it calls the run() method on the org.apache.hadoop.mapreduce.Mapper which again calls the map() method on configured mapper.

On the line# 616 of the above link, MapTask creates the context object with all the details of job configuration, etc. as mentioned by @harpun and then passes onto the run() method on line # 621.

The above explanation holds good for reduce task as well with appropriate ReduceTask class being the main entry class.

Upvotes: 3

harpun
harpun

Reputation: 4110

Yes, the run() method of the mapper is called by the MR framework when running the map task attempt. As far as the context is concerned, take a look at the documentation for Mapper.Context, especially the implemented interfaces and their javadocs give you a full overview of the information contained in the context. Through the context, you can access data like:

  • job information (job configuration, mapper/reducer class names, job name, working directory)
  • status of the currently executed task attempt
  • current key, value, input split (map task specific information)

Of course similar context object exists for the Reducer: Reducer.Context.

Upvotes: 0

Related Questions