Reputation: 8374
I am trying to use MultipleInputs from Hadoop. All my mapper will be FixedLengthInputFormat.
MultipleInputs.addInputPath(job,
new Path(rootDir),
FixedLengthInputFormat.class,
OneToManyMapper.class);
The problem is that each mapper has fixed record width with different size.
config.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, ??);
Is there anyway to passe the FIXED_RECORD_LENGTH for each mapper, using MultipleInputs?
Thanks!
Upvotes: 1
Views: 424
Reputation: 8374
Here is the solution:
public class CustomFixedLengthInputFormat extends FixedLengthInputFormat{
@Override
public RecordReader<LongWritable, BytesWritable> createRecordReader(
InputSplit split, TaskAttemptContext context) throws IOException,
InterruptedException {
//here i can control de recordLength size!
int recordLength = ??;// getRecordLength(context.getConfiguration());
if (recordLength <= 0) {
throw new IOException(
"Fixed record length "
+ recordLength
+ " is invalid. It should be set to a value greater than zero");
}
System.out.println("Record Length: " + recordLength);
return new FixedLengthRecordReader(recordLength);
}
}
Upvotes: 1