Reputation: 317
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html#method.summary
run (Context) method of org.apache.hadoop.mapreduce.Mapper
a). Expert users can override this method for more complete control over the execution of the Mapper.
Currently what is the default behavior of run(Context) method.
If i override run(Context) what kind of special control will get as per the documentation?
Is anyone overridden this method in your implementations?
Upvotes: 1
Views: 441
Reputation: 9844
- Currently what is the default behavior of run(Context) method.
The default implementation is visible in the Apache Hadoop source code for the Mapper class:
/**
* Expert users can override this method for more complete control over the
* execution of the Mapper.
* @param context
* @throws IOException
*/
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
To summarize:
setup
for one-time initialization.map
method implementation.cleanup
for one-time teardown.
- If i override run(Context) what kind of special control will get as per the documentation?
The default implementation always follows a specific sequence of execution in a single thread. Overriding this would be rare, but it might open up possibilities for highly specialized implementations, such as different threading models or attempting to coalesce redundant key ranges.
- Is anyone overridden this method in your implementations?
Within the Apache Hadoop codebase, there are two overrides of this:
ChainMapper
allows chaining together multiple Mapper
class implementations for execution within a single map task. The override of run
sets up an object representing the chain, and passes each input key/value pair through that chain of mappers.MultithreadedMapper
allows multi-threaded execution of another Mapper
class. That Mapper
class must be thread-safe. The override of run
starts multiple threads iterating the input key-value pairs and passing them through the underlying Mapper
.Upvotes: 2