Venu A Positive
Venu A Positive

Reputation: 3062

When we are not use Combiner in Mapreduce?

Every Hadoop developer knows Combiner is key to optimize mapreduce, but it's optional. It can minimize bandwidth and improve the mapreduce job performance. Here my question is, hadoop make many features as by default like data locality issue, but not make Combiner as default value. Why? It means in all scenarios combiner is not recommendable? When we don't use the combiner? If i'll make it as a default whats the problem?

Upvotes: 0

Views: 7133

Answers (2)

chandu kavar
chandu kavar

Reputation: 420

If you set the combiner in your job, then Hadoop will decide, run a combiner or not based on the data.

But if you do not set the combiner then, Hadoop will not run the combiner.

When combiner runs, it will decrease the size of the output.Hence small amount of data will travel in network.

For difference between combiner and reducer, check below link:

http://blog.optimal.io/3-differences-between-a-mapreduce-combiner-and-reducer/

Upvotes: -1

vanekjar
vanekjar

Reputation: 2406

Combiner can be used just in case the reduce function is both commutative and associative. It's because values are combined locally before shuffle in arbitrary order.


Commutative - The order in which we process the operation against the values has no effect on the result in a way:

1 + 2 + 3 = 1 + 3 + 2

Associative - The order in which we process the operation against the values has no effect on the result in a way:

(1 + 2) + 3 = 1 + (2 + 3)

So it's good to use combiner e. g. for sum() operation, but there are operations for which it doesn't work. So it's always programmer responsibility to decide if the combiner may be used for particular algorithm.

Upvotes: 5

Related Questions