Reputation: 1385
I have a 90MB snappy compressed file that I am attempting to use as input to Hadoop 2.2.0 on AMI 3.0.4 in AWS EMR.
Immediately upon attempting to read the file my record reader gets the following exception:
2014-05-06 14:25:34,210 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:365)
...
I'm running on an m1.xlarge in AWS using the default memory and io.sort.mb. If we decompress the file and use that as input instead everything goes fine. Trouble is we have a very large number of compressed files and don't want to go around decompressing everything.
I'm not sure if we're missing a configuration setting or a wiring in our code of some sort. Not sure how to proceed.
Upvotes: 2
Views: 11748
Reputation: 1574
As per the log you have provided , it seems size of decompressed block is more than your available heap size.
I don't know about m1.large instance specifications on EMR, however here are some of the things you can try to ward off this error.
Usually error running child means , the child that yarn spawned can't find enough heap space to continue its MR job.
Options to try :
1) Increase mapred.java.child.opts
size. It is the default size that child gets as its separate JVM process. By default, its 200mb , which is small for any reasonable data analysis. Change the parameters -XmxNu
( max heapsize of N in u units) and -XmsNu
(initial heap size of N in units of u ). Try for 1Gb i.e. -Xmx1g and see the effect and if it succeeds then go smaller
2) set up mapred.child.ulimit
to 1.5 or 2 times the size of max heap size as set previously. It sets the amount of virtual memory for for a process.
3) reduce mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum
to set max no of parallel mappers and reducers running at a time.
4) io.sort.mb
- which you have already tried. try it to 0.25*mapred.child.java.opts < io.sort.mb < 0.5*mapred.child.java.opts
.
And at last, its a trial and error method, so try and see which one sticks.
Upvotes: 2