Reputation: 147
In Mapreduce, How does Reduce task differ from Reducer?
What is the correlation between reduce task and reducer?
Does Reducer perform the reduce task?
Many Thanks
Upvotes: 1
Views: 151
Reputation: 38910
From Apache documentation,
Reducer reduces a set of intermediate values which share a key to a smaller set of values.
Reducer has 3 primary phases:
Shuffle
Reducer is input the grouped output of a Mapper. In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP.
Sort
The framework groups Reducer inputs by keys (since different Mappers may have output the same key) in this stage.
Reduce
In this phase the reduce(Object, Iterator, OutputCollector, Reporter)
method is called for each pair in the grouped inputs.
The output of the Reduce task is typically written to the FileSystem via OutputCollector.collect(Object, Object)
.
Note that apart from Reducer, Combiner also invoke reduce function since it is implementing Reducer interface.
Reducer
is a class, which contain reduce
function as below
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
) throws IOException, InterruptedException {
Reduce task
is program running on a node, which is executing reduce
function of Reducer
class.
Upvotes: 1
Reputation: 1170
Reduce task
is simply an instance of the Reducer.
The number of reduce tasks is configurable.
Either it can be specified by setting property mapred.reduce.tasks
in the job configuration object
or
org.apache.hadoop.mapreduce.Job#setNumReduceTasks(int reducerCount);
method can be used.
Upvotes: 1