Instinct
Instinct

Reputation: 2251

What determines how many times map() will get called?

I have a text file and a parser that will parse each line(s) and store into my customSplitInput, I do the parsing in my custom FileInputFormat phase so my splits are custom. Right now, I have 2 splits and within each split contains a list of my data.

But right now, my mapper function is getting called repeatedly on the same split. I thought the mapper function will only get called based on the number of splits you have?

I don't know if this applies but my custom InputSplit returns a fixed number for getLength() and an empty string array for getLocation(). I am unsure of what to put in for these.

@Override
    public RecordReader<LongWritable, ArrayWritable> createRecordReader(
            InputSplit input, TaskAttemptContext taskContext)
            throws IOException, InterruptedException {
        logger.info(">>> Creating Record Reader");
        CustomRecordReader recordReader = new CustomRecordReader(
                (EntryInputSplit) input);
        return recordReader;
    }

Upvotes: 1

Views: 743

Answers (1)

Jeremy Beard
Jeremy Beard

Reputation: 2725

map() is called once for every record from the RecordReader in (or referenced by) your InputFormat. For example, TextInputFormat calls map() for every line in the input, even though there are usually many lines in a split.

Upvotes: 2

Related Questions