Reputation: 12288
If right, as per the title, what if too much data in a single key to be processed by one reduce?
If not, are they many reduce levels for processing? one reduce emits and another consumes? this seems not to be right because there will be problems when inputs format is different from outputs format, but just fell confused.
It may be related: Is the combiner only for LOCAL aggregation as per one node or for global aggregation for all nodes?
Really need an answer not simply 'yes' or 'no', but an explanation will be appreciated!
Upvotes: 0
Views: 60
Reputation: 33545
Yes, data for a specific key will be sent to a specific reducer. Combiner will definitely alleviate the problem of having most of the records for a single key. I am not able to think of a better way of making the job complete faster.
It may be related: Is the combiner only for LOCAL aggregation as per one node or for global aggregation for all nodes?
Combiner runs on the same node as the mapper and is for local aggregation, reducer is for global aggregation across all the nodes in the cluster.
Upvotes: 1