Hadoop Reducer unable to accumulate all values in one iteration

Question

I have a basic scenario in Hadoop:

All mappers send all values to the same key. Therefore all values end up on the same reducer.

However, when I iterate the values in the reducer, the iterator does not process all of the entries.

For example, I could have the following code:

while (values.hasNext())
{
  result = result + values.next().toString() + "
";
}
// Assume, that all values sent to this reducer is now in the 'result' variable
do_important_stuff(result);

I would like accumulate the values associated, and then process the data in the function, 'do_important_stuff()'. But I am not able to do so - the while-loop breaks too soon.

Am I missing a crucial point about Hadoop? Is my assumption wrong?

MartinL · Accepted Answer

The problem seems to be caused by assigning references instead of values.

With an ArrayList as accumulator and cloning of every value, Ex:

result = new ArrayList();
while (values.hasNext())
{
  result.add(new Text(values.next());
}

the iterator would terminate with all the desired values in the list.

Hadoop Reducer unable to accumulate all values in one iteration

Answers (2)

Related Questions