Reputation: 1
I have a basic scenario in Hadoop:
All mappers send all values to the same key. Therefore all values end up on the same reducer.
However, when I iterate the values in the reducer, the iterator does not process all of the entries.
For example, I could have the following code:
while (values.hasNext())
{
result = result + values.next().toString() + "\n";
}
// Assume, that all values sent to this reducer is now in the 'result' variable
do_important_stuff(result);
I would like accumulate the values associated, and then process the data in the function, 'do_important_stuff()'. But I am not able to do so - the while-loop breaks too soon.
Am I missing a crucial point about Hadoop? Is my assumption wrong?
Upvotes: 0
Views: 514
Reputation: 1
The problem seems to be caused by assigning references instead of values.
With an ArrayList as accumulator and cloning of every value, Ex:
result = new ArrayList<Text>();
while (values.hasNext())
{
result.add(new Text(values.next());
}
the iterator would terminate with all the desired values in the list.
Upvotes: 0
Reputation: 88378
You are controlling the loop with
values.hasNext()
but are advancing with
rows.next()
Are rows
and values
the same object? I suspect a typo. :)
Upvotes: 1