Matteo
Matteo

Reputation: 276

Hadoop MapReduce iterate over input values of a reduce call

I'm testing a simple mapreduce application, but I'm getting a little stuck trying to understand what happen when I iterate over input values of a reduce call.

This is the piece of code which behaves strangely..

public void reduce(Text key, Iterable<E> values, Context context)
    throws IOException, InterruptedException{

    Iterator<E> iterator = values.iterator();
    E first = (E)statesIter.next();

    while(statesIter.hasNext()){
        E state = statesIter.next();

        System.out.println(first.toString());
        // some other stuff
    }
    // some other stuff
}

so nothing strange.. except the fact that each println invocation actually prints a different string. So, every time I call the next() method, the object referenced by first changes.

So why this strange behavior?

Upvotes: 1

Views: 1205

Answers (1)

TC1
TC1

Reputation: 1

It's somewhat counter-intuitive, but it's actually documented in the API docs -- Hadoop reuses the keys / values, you should clone them if you want to keep them around.

Upvotes: 4

Related Questions