Reputation: 4338
I have an application where I would like to make my reducers (I have several of them for a map/reduce job) to record their outputs into different files on the HDFS depending on the key coming to them for processing. So if the reducer sees a key of say type A, apply the reduce the logic but tell Hadoop to put the result into the hdfs file belonging to type A result and so on. Obviously multiple reducers can will be outputting different portion of the type A result and each reducer can end up working on any type like A or B but tell hadoop to write the result into the type A bucket or something
Is this possible?
Upvotes: 0
Views: 149
Reputation: 5239
MultipleOutputs is almost what you are looking for (assuming you are at least at version 0.21). In my own work I have used a clone of this class modified to be more flexible about naming conventions to send output to different folders/files based on anything I want, including aspects of the input records (keys or values). As is, the class has some draconian restrictions on what names you can give to the outputs.
Upvotes: 1