Reputation: 3001
I would like to have an arrayList that holds reference to object inside the reduce function.
@Override
public void reduce( final Text pKey,
final Iterable<BSONWritable> pValues,
final Context pContext )
throws IOException, InterruptedException{
final ArrayList<BSONWritable> bsonObjects = new ArrayList<BSONWritable>();
for ( final BSONWritable value : pValues ){
bsonObjects.add(value);
//do some calculations.
}
for ( final BSONWritable value : bsonObjects ){
//do something else.
}
}
The problem is that the bsonObjects.size() returns the correct number of elements but all the elements of the list are equal to the last inserted element. e.g. if the
{id:1}
{id:2}
{id:3}
elements are to be inserted the bsonObjects will hold 3 items but all of them will be {id:3}. Is there a problem with this approach? any idea why this happens? I have tried to change the List to a Map but then only one element was added to the map. Also I have tried to change the declaration of the bsonObject to global but the same behavior happes.
Upvotes: 0
Views: 210
Reputation: 2669
This is documented behavior. The reason is that the pValues Iterator re-uses the BSONWritable instance and when it's value changes in the loop all references in bsonObjects ArrayList are updated as well. You're storing a reference when you call add() on bsonObjects. This approach allows Hadoop to save memory.
You should instantiate a new BSONWritable variable in that first loop that equals the variable value (deep copy). Then add the new variable into bsonObjects.
Try this:
for ( final BSONWritable value : pValues ){
BSONWritable v = value;
bsonObjects.add(v);
//do some calculations.
}
for ( final BSONWritable value : bsonObjects ){
//do something else.
}
Then you will be able to iterate through bsonObjects in the second loop and retrieve each distinct value.
However, you should also be careful -- if you make a deep copy all the values for the key in this reducer will need to fit in memory.
Upvotes: 2