org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space

Question

I have a 90MB snappy compressed file that I am attempting to use as input to Hadoop 2.2.0 on AMI 3.0.4 in AWS EMR.

Immediately upon attempting to read the file my record reader gets the following exception:

2014-05-06 14:25:34,210 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:365)
...

I'm running on an m1.xlarge in AWS using the default memory and io.sort.mb. If we decompress the file and use that as input instead everything goes fine. Trouble is we have a very large number of compressed files and don't want to go around decompressing everything.

I'm not sure if we're missing a configuration setting or a wiring in our code of some sort. Not sure how to proceed.

org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space

Answers (1)

Related Questions