Hadoop MapReduce iterate over input values of a reduce call

Question

I'm testing a simple mapreduce application, but I'm getting a little stuck trying to understand what happen when I iterate over input values of a reduce call.

This is the piece of code which behaves strangely..

public void reduce(Text key, Iterable values, Context context)
    throws IOException, InterruptedException{

    Iterator iterator = values.iterator();
    E first = (E)statesIter.next();

    while(statesIter.hasNext()){
        E state = statesIter.next();

        System.out.println(first.toString());
        // some other stuff
    }
    // some other stuff
}

so nothing strange.. except the fact that each println invocation actually prints a different string. So, every time I call the next() method, the object referenced by first changes.

So why this strange behavior?

TC1 · Accepted Answer

It's somewhat counter-intuitive, but it's actually documented in the API docs -- Hadoop reuses the keys / values, you should clone them if you want to keep them around.

Hadoop MapReduce iterate over input values of a reduce call

Answers (1)

Related Questions