Definitive source for when Hadoop MapReduce Runs a Combiner

Question

There have been quite a few questions like this one already, with conflicting answers. I've also found conflicting statements in the literature and on blogs. In the book, Hadoop, the Definitive Guide, it says

Hadoop does not provide a guarantee of how many times it will call [the combiner] for a particular map output record, if at all. In other words, calling the combiner function zero, one or many times should produce the same output from the reducer

The answers to a similar question here On what basis mapreduce framework decides whether to launch a combiner or not suggest that a combiner, if defined, will always be called once as the MapOutputBuffer needs to be flushed.

There might be an edge case where the mapper emits only once, meaning the combiner, even if defined, won't run.

My question is this: Is there a definitive source for the answer to this question? I've searched the Hadoop documentation, of course, but can't find anything.

Definitive source for when Hadoop MapReduce Runs a Combiner

Answers (1)

Related Questions