Reputation: 3062
Every Hadoop developer knows Combiner is key to optimize mapreduce, but it's optional. It can minimize bandwidth and improve the mapreduce job performance. Here my question is, hadoop make many features as by default like data locality issue, but not make Combiner as default value. Why? It means in all scenarios combiner is not recommendable? When we don't use the combiner? If i'll make it as a default whats the problem?
Upvotes: 0
Views: 7133
Reputation: 420
If you set the combiner in your job, then Hadoop will decide, run a combiner or not based on the data.
But if you do not set the combiner then, Hadoop will not run the combiner.
When combiner runs, it will decrease the size of the output.Hence small amount of data will travel in network.
For difference between combiner and reducer, check below link:
http://blog.optimal.io/3-differences-between-a-mapreduce-combiner-and-reducer/
Upvotes: -1
Reputation: 2406
Combiner can be used just in case the reduce function is both commutative and associative. It's because values are combined locally before shuffle in arbitrary order.
Commutative - The order in which we process the operation against the values has no effect on the result in a way:
1 + 2 + 3 = 1 + 3 + 2
Associative - The order in which we process the operation against the values has no effect on the result in a way:
(1 + 2) + 3 = 1 + (2 + 3)
So it's good to use combiner e. g. for sum()
operation, but there are operations for which it doesn't work. So it's always programmer responsibility to decide if the combiner may be used for particular algorithm.
Upvotes: 5