Reputation: 29143
In my company developers go to great lengths to not create objects inside mappers / reducers. E.g., working with the basic avro record (using positions), working with byte arrays and streams instead of objects, etc.
This sounds to me like over optimization. Java based servers need to be performant as well, but people don't program like this.
So what is right?
Upvotes: 0
Views: 373
Reputation: 176
I don't think you can say right or wrong, but perhaps overkill. You're (presumably) sacrificing readability and maintainability for some performance gains. Remember, that if you get your reducer to run 1 second faster and your job uses 100 nodes to reduce, it doesn't finish 100 seconds faster, only 1 assuming equal distribution of keys and available resources at the start.
Personally I declare class variables and initialize them in my constructor (see tip #6). Then I set them rather than creating new objects within the mapper or reducer. This way you only incur the hit once. You just have to make sure to clear the object at the start of the map or reduce method to ensure you don't have carryover from a previous invocation.
Upvotes: 1