MartinL
MartinL

Reputation: 1

Hadoop Reducer unable to accumulate all values in one iteration

I have a basic scenario in Hadoop:

All mappers send all values to the same key. Therefore all values end up on the same reducer.

However, when I iterate the values in the reducer, the iterator does not process all of the entries.

For example, I could have the following code:

while (values.hasNext())
{
  result = result + values.next().toString() + "\n";
}
// Assume, that all values sent to this reducer is now in the 'result' variable
do_important_stuff(result);

I would like accumulate the values associated, and then process the data in the function, 'do_important_stuff()'. But I am not able to do so - the while-loop breaks too soon.

Am I missing a crucial point about Hadoop? Is my assumption wrong?

Upvotes: 0

Views: 514

Answers (2)

MartinL
MartinL

Reputation: 1

The problem seems to be caused by assigning references instead of values.

With an ArrayList as accumulator and cloning of every value, Ex:

result = new ArrayList<Text>();
while (values.hasNext())
{
  result.add(new Text(values.next());
}

the iterator would terminate with all the desired values in the list.

Upvotes: 0

Ray Toal
Ray Toal

Reputation: 88378

You are controlling the loop with

values.hasNext()

but are advancing with

rows.next()

Are rows and values the same object? I suspect a typo. :)

Upvotes: 1

Related Questions