Reputation: 653
I am writing a hadoop app where I want to read the input file as a whole and send it to manny mappers and let each mappers do part of the job. Here is my FileInputFormat. I have to make isSplitable
return false so that I can read the whole file. However, this leads to that only one mapper
will be initialized. Is there anyone who can tell me how to read the input file as a whole and send it to more than one mappers to process?
public class WholeFileInputFormat extends FileInputFormat<PairWritable, BytesWritable> {
@Override
protected boolean isSplitable(FileSystem fs, Path filename) {
return false;
}
@Override
public RecordReader<PairWritable, BytesWritable> getRecordReader(
InputSplit split, JobConf job, Reporter reporter) throws IOException {
return new WholeFileRecordReader((FileSplit) split, job);
}
}
Upvotes: 0
Views: 200
Reputation: 5239
Add to WholeFileInputFormat an implementation of getSplits
that returns as many duplicates as you want.
Upvotes: 3