iCode
iCode

Reputation: 4338

Control the Reducer Result Output File/Bucket

I have an application where I would like to make my reducers (I have several of them for a map/reduce job) to record their outputs into different files on the HDFS depending on the key coming to them for processing. So if the reducer sees a key of say type A, apply the reduce the logic but tell Hadoop to put the result into the hdfs file belonging to type A result and so on. Obviously multiple reducers can will be outputting different portion of the type A result and each reducer can end up working on any type like A or B but tell hadoop to write the result into the type A bucket or something

Is this possible?

Upvotes: 0

Views: 149

Answers (1)

Judge Mental
Judge Mental

Reputation: 5239

MultipleOutputs is almost what you are looking for (assuming you are at least at version 0.21). In my own work I have used a clone of this class modified to be more flexible about naming conventions to send output to different folders/files based on anything I want, including aspects of the input records (keys or values). As is, the class has some draconian restrictions on what names you can give to the outputs.

Upvotes: 1

Related Questions