Reputation: 3864
I read somewhere that there could be performance gain if we define output writables when Mapper/Reducer is created and that in Mapper/Reducer we should only set the value of that writable instead of creating the writable for each output record.
For example (in pseudo-code):
IntWritable idWritable = new IntWritable();
map(){
idWritable.setValue(outputValue);
emit(idWritable);
}
Is more optimal than:
map(){
IntWritable idWritable = new IntWritable(outputValue);
emit(idWritable);
}
Is this true? Is it really a good practice to define output writables when creating a Mapper/Reducer, which will be used for all output records?
Upvotes: 2
Views: 36
Reputation: 5531
Yes this is true. In your second example you're creating a brand new IntWritable
every time you process a record. This requires overhead for new memory allocation, and also means that the old IntWritable
has to be garbage collected at some point. If you're processing millions of records and using a complex Writable
(say with several ints
and Strings
), the heap can be filled very quickly.
Alternately, by just re-setting the value within the same object, no new memory needs to be allocated and no garbage collection needs to take place. It's much faster, but I can recommend doing your own experiments to confirm this.
Upvotes: 1