Hadoop Map Reduce Index out of bounds

My program was running fine for lesser inputs but when i increase the size of the input it seems that line 210 (context.nextKeyValue();) throws indexoutofbounds exception. This below is the setup method of the mapper. I call nextkeyvalue in there once because the first line of each file is a header. Splitting files is set to false because of the headers. Does it have to do with memory? how to solve this?

Also the error message below is displayed 68 times even though I've set the maxmapattempt to 3. there are 55 splits by the way. Shouldn't it be displayed 55 times or maybe 55*3? or maybe just 3? how does it work?

@Override
    protected void setup(Context context) throws IOException, InterruptedException
    {
        Configuration conf = context.getConfiguration();
        DupleSplit fileSplit = (DupleSplit)context.getInputSplit();
        //first line is header. Indicates the first digit of the solution. 
        context.nextKeyValue(); <---- LINE 210
        URI[] uris = context.getCacheFiles();

        int num_of_colors = Integer.parseInt(conf.get("num_of_colors"));
        int order = fileSplit.get_order();
        int first_digit = Integer.parseInt(context.getCurrentValue().toString());

        //perm_path = conf.get(Integer.toString(num_of_colors - order -1));
        int offset = Integer.parseInt(conf.get(Integer.toString(num_of_colors - order -1)));
        uri = uris[offset];
        Path perm_path = new Path(uri.getPath());
            perm_name = perm_path.getName().toString();

        String pair_variables = "";
        for (int i=1; i<=num_of_colors; i++)
            pair_variables += "X_" + i + "_" + (num_of_colors - order) + "\t";
        for (int i=1; i<num_of_colors; i++)
            pair_variables += "X_" + i + "_" + (num_of_colors - order - first_digit) + "\t";
        pair_variables += "X_" + num_of_colors + "_" + (num_of_colors - order - first_digit);
        context.write(new Text(pair_variables), null);
    }

Here's the error log:

Error: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkBounds(Buffer.java:559)
at java.nio.ByteBuffer.get(ByteBuffer.java:668)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:168)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at produce_data_hdfs$input_mapper.setup(produce_data_hdfs.java:210)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Upvotes: 1

Answers (2)

vefthym

Reputation: 7462

I have never seen calling this method before, and it seems that you don't even need it, as you don't store its results in any variable.

Why don't you just skip the first key,value pair inside the map() method? You can easily do that by having a counter, initialized as 0 from the setup method and increase it at the beginning of map. Then, skip your map computations when this counter is equal to 1:

int counter;

setup(){
   counter = 0;
   ...
}

map() {
    if (++counter == 1) {
        return;
    }
    ... //your existing map code goes here
}

The error message is shown 68 times, maybe because it is shown once for each map task that can run at the same time (as many as the available map slots in your cluster), then those tasks are re-executed (each task twice), until some of them fail, causing the whole job to fail (there is a threshold on how many tasks can fail before the whole job fails).

Upvotes: 0

Wes

Reputation: 658

I know this is a few years late, but for anyone who looks at this, Hadoop 2.6 had an unsafe typecast from long to int. This caused IOOB exceptions in many cases. I believe the patch was released in version 2.7.3. You can read about it on https://issues.apache.org/jira/browse/MAPREDUCE-6635. I hope this helps anyone experiencing this problem.

Upvotes: 1

Hadoop Map Reduce Index out of bounds

Answers (2)

Related Questions