Reputation: 1666
I am running Hadoop 0.21.0 in a single node cluster to process a single big > 200 GB file. For decreasing the execution time, I have tried different HDFS block sizes ( 128, 256, 512 MB, 1, 1.5, 1.75 GB ) respectively. However, I have got the following exception when using block size >= 2 GB.
Note: I am using java-8-oracle.
2015-08-05 12:02:12,524 WARN org.apache.hadoop.mapred.Child: Exception running child : java.lang.IndexOutOfBoundsException
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:186)
at org.apache.hadoop.hdfs.BlockReader.read(BlockReader.java:113)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:466)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:517)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1518)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1483)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1451)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1432)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:60)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:460)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Upvotes: 1
Views: 186
Reputation: 1447
For the Hadoop version you are using(0.21.0) seems so.
The issue you have was fixed for the next version, see more here: https://issues.apache.org/jira/browse/HDFS-96
Upvotes: 2