Reputation: 114

More than One Reducer and One Output File

In my hadoop code, I have 4 reducers and I always have 4 output Files which is quite normal as each reducer puts its result in one file. My question here: how can I have one and only one output file?

The problem is that I have an iterative mapreduce job which takes an input file, divides it into chuncks and gives each chunck to a mapper, so that's why I have to gather up all the reducers results and put them in one output file in order to divide this output file in an equivilant way into 4 parts, each part is then given to one mapper and so on.

Upvotes: 2

Answers (2)

Tom Sebastian

Reputation: 3433

Can you try MultipleOutputs, where you can specify output file to which each reducer should write. For example in your reducer code:

   ...
   public void setup(Context context) {
       out = new MultipleOutputs<YourKey,YourValue>(context);     
     }
    public void reduce(YourKey key, Iterable<YourValue> values, Context context)
            throws IOException, InterruptedException {
             .......
        //instead of writing using context, use multipleoutput here
        //context.write(key, your-result);
        out.write(key, your-result,"path/filename");
    }
    public void cleanup(Context context) throws IOException,InterruptedException {
        out.close();        
     }
    .....

For this case you need to ensure some job configuration also.

......
job.setOutputFormatClass(NullOutputFormat.class);
LazyOutputFormat.setOutputFormatClass(job, FileOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("output"));
......

In this case eachreducer out put will be written into output/path/filename

Upvotes: 0

sunil

Reputation: 1279

You can very well configure the number of reducer you wanted. while defining you job use this

job.setNumReduceTasks(1)

Upvotes: -1

More than One Reducer and One Output File

Answers (2)

Related Questions