Reputation: 2251
I have a text file and a parser that will parse each line(s) and store into my customSplitInput, I do the parsing in my custom FileInputFormat phase so my splits are custom. Right now, I have 2 splits and within each split contains a list of my data.
But right now, my mapper function is getting called repeatedly on the same split. I thought the mapper function will only get called based on the number of splits you have?
I don't know if this applies but my custom InputSplit returns a fixed number for getLength() and an empty string array for getLocation(). I am unsure of what to put in for these.
@Override
public RecordReader<LongWritable, ArrayWritable> createRecordReader(
InputSplit input, TaskAttemptContext taskContext)
throws IOException, InterruptedException {
logger.info(">>> Creating Record Reader");
CustomRecordReader recordReader = new CustomRecordReader(
(EntryInputSplit) input);
return recordReader;
}
Upvotes: 1
Views: 743
Reputation: 2725
map()
is called once for every record from the RecordReader
in (or referenced by) your InputFormat
. For example, TextInputFormat
calls map()
for every line in the input, even though there are usually many lines in a split.
Upvotes: 2