Reputation: 8903
I am trying to understand the concept of running multiple reducers in MR job and came to know that it is partitioner which decides which (key,value) pairs goes to which reducer.
Can we run multiple reducers without running partitioner? Would that be a valid scenario?
Upvotes: 0
Views: 132
Reputation: 5538
Think partitioner as the entity which decides on which reducer(bucket) is going to process a particular key-value (element) output of a mapper.
The default partitioner uses a hash function of key to divide the elements across reducer. An analogy is how core java map collection uses hash function to decided bucket (reducer) for the element (key-value).
In this process, it guarantee that the same key is sent to a single reducer (which process the all the values of the key). So, if mapper emits m
unique key (each key can have any count) and there are n
reducer, partitioner tries to distribute keys such that each reducer gets m/n
unique keys along with a list of values associated with the key.
Note that, it is possible to set the number of reducer in the program. It means you are saying the partitioner to restrict number of buckets available to distribute the keys.
Upvotes: 2
Reputation: 5521
If you don't specify a partitioner, the default HashPartitioner
runs. It simply hashes based on the key:
public int getPartition(K2 key, V2 value, int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
Upvotes: 2