Defining one writable for whole Mapper/Reducer

Question

I read somewhere that there could be performance gain if we define output writables when Mapper/Reducer is created and that in Mapper/Reducer we should only set the value of that writable instead of creating the writable for each output record.

For example (in pseudo-code):

IntWritable idWritable = new IntWritable();

map(){
     idWritable.setValue(outputValue);
     emit(idWritable);
}

Is more optimal than:

map(){
     IntWritable idWritable = new IntWritable(outputValue);
     emit(idWritable);
}

Is this true? Is it really a good practice to define output writables when creating a Mapper/Reducer, which will be used for all output records?

Ben Watson · Accepted Answer

Yes this is true. In your second example you're creating a brand new IntWritable every time you process a record. This requires overhead for new memory allocation, and also means that the old IntWritable has to be garbage collected at some point. If you're processing millions of records and using a complex Writable (say with several ints and Strings), the heap can be filled very quickly.

Alternately, by just re-setting the value within the same object, no new memory needs to be allocated and no garbage collection needs to take place. It's much faster, but I can recommend doing your own experiments to confirm this.

Defining one writable for whole Mapper/Reducer

Answers (1)

Related Questions