Reputation: 114
In my hadoop code, I have 4 reducers and I always have 4 output Files which is quite normal as each reducer puts its result in one file. My question here: how can I have one and only one output file?
The problem is that I have an iterative mapreduce job which takes an input file, divides it into chuncks and gives each chunck to a mapper, so that's why I have to gather up all the reducers results and put them in one output file in order to divide this output file in an equivilant way into 4 parts, each part is then given to one mapper and so on.
Upvotes: 2
Views: 1461
Reputation: 3433
Can you try MultipleOutputs
, where you can specify output file to which each reducer should write.
For example in your reducer code:
...
public void setup(Context context) {
out = new MultipleOutputs<YourKey,YourValue>(context);
}
public void reduce(YourKey key, Iterable<YourValue> values, Context context)
throws IOException, InterruptedException {
.......
//instead of writing using context, use multipleoutput here
//context.write(key, your-result);
out.write(key, your-result,"path/filename");
}
public void cleanup(Context context) throws IOException,InterruptedException {
out.close();
}
.....
For this case you need to ensure some job configuration also.
......
job.setOutputFormatClass(NullOutputFormat.class);
LazyOutputFormat.setOutputFormatClass(job, FileOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("output"));
......
In this case eachreducer out put will be written into output/path/filename
Upvotes: 0
Reputation: 1279
You can very well configure the number of reducer you wanted. while defining you job use this
job.setNumReduceTasks(1)
Upvotes: -1