p.magalhaes
p.magalhaes

Reputation: 8374

Hadoop - MultipleInputs

I am trying to use MultipleInputs from Hadoop. All my mapper will be FixedLengthInputFormat.

MultipleInputs.addInputPath(job, 
                    new Path(rootDir),       
                    FixedLengthInputFormat.class, 
                    OneToManyMapper.class);

The problem is that each mapper has fixed record width with different size.

config.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, ??);

Is there anyway to passe the FIXED_RECORD_LENGTH for each mapper, using MultipleInputs?

Thanks!

Upvotes: 1

Views: 424

Answers (1)

p.magalhaes
p.magalhaes

Reputation: 8374

Here is the solution:

public class CustomFixedLengthInputFormat extends FixedLengthInputFormat{

    @Override
    public RecordReader<LongWritable, BytesWritable> createRecordReader(
            InputSplit split, TaskAttemptContext context) throws IOException,
            InterruptedException {
        //here i can control de recordLength size!
        int recordLength = ??;// getRecordLength(context.getConfiguration());
        if (recordLength <= 0) {
            throw new IOException(
                    "Fixed record length "
                            + recordLength
                            + " is invalid.  It should be set to a value greater than zero");
        }

        System.out.println("Record Length: " + recordLength);

        return new FixedLengthRecordReader(recordLength);
    }

}

Upvotes: 1

Related Questions