arikabc
arikabc

Reputation: 755

Hadoop: Reducer is called twice

I'm working with Hadoop on EMR. I wrote a simple program, that runs a single map-reduce process. The output I got was not what I expected, and with debug prints I discovered that the reducer is actually called twice: once with the output of the mapper as input, and a second time with the output of the first reducer as input.
Finally the output of the second time the reducer runs is what I get as output.
I'm using Hadoop 2.4.0 on AMI 3.1.1, and the reduce method signature is:

@Override  
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException 

Does anyone know why that may happen?

Upvotes: 3

Views: 1804

Answers (2)

arikabc
arikabc

Reputation: 755

Thanks for your answer,
The problem was very simple: I copied the job configuration from a previous working configuration, in which the combiner and reducer were the same class.
So I wrote the combiner class as the reducer, which made the reducer run twice. Cancelling the combiner class has solved the problem.

Thanks again!

Upvotes: 6

Paul Sanwald
Paul Sanwald

Reputation: 11329

Hadoop uses speculative execution by default and there is no guarantee your mapper or reducer will be run only once. This is why you should disable speculative execution for reduce tasks that have side effects. It's very possible to see multiple log statements and things like that.

The answer here is to use the jobtracker to verify that the map and reduce tasks completed successfully, and were run in the way you expect. Because of speculative execution, looking at log statements is not a reliable way of determining this.

The other possibility is that your job is defined in such a way that the reducer is indeed being called twice, possibly incorrectly. you'd need to post the configuration of your job and a lot more detail, in order for us to be able to work out if this is in fact the case.

Upvotes: 5

Related Questions