Anuj
Anuj

Reputation: 9632

out of Memory Error in Hadoop

I tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document. When I tried executing this

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 

I am getting the following Exception

java.lang.OutOfMemoryError: Java heap space

Please suggest a solution so that i can try out the example. The entire Exception is listed below. I am new to Hadoop I might have done something dumb . Any suggestion will be highly appreciated.

anuj@anuj-VPCEA13EN:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/12/11 17:38:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/11 17:38:22 INFO mapred.FileInputFormat: Total input paths to process : 7
11/12/11 17:38:22 INFO mapred.JobClient: Running job: job_local_0001
11/12/11 17:38:22 INFO util.ProcessTree: setsid exited with exit code 0
11/12/11 17:38:22 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e49dcd
11/12/11 17:38:22 INFO mapred.MapTask: numReduceTasks: 1
11/12/11 17:38:22 INFO mapred.MapTask: io.sort.mb = 100
11/12/11 17:38:22 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
11/12/11 17:38:23 INFO mapred.JobClient:  map 0% reduce 0%
11/12/11 17:38:23 INFO mapred.JobClient: Job complete: job_local_0001
11/12/11 17:38:23 INFO mapred.JobClient: Counters: 0
11/12/11 17:38:23 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1257)
    at org.apache.hadoop.examples.Grep.run(Grep.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.examples.Grep.main(Grep.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Upvotes: 60

Views: 101741

Answers (16)

Jay Prall
Jay Prall

Reputation: 5465

If you are using Hadoop on Amazon EMR, a configuration can be added to increase the heap size:

[
  {
    "Classification": "hadoop-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "HADOOP_HEAPSIZE": "2048"
        },
        "Configurations": []
      }
    ]
  }
]

Upvotes: 0

Pravat Sutar
Pravat Sutar

Reputation: 151

Configure the JVM heap size for your map and reduce processes. These sizes need to be less than the physical memory you configured in the previous section. As a general rule, they should be 80% the size of the YARN physical memory settings.

Configure mapreduce.map.java.opts and mapreduce.reduce.java.opts to set the map and reduce heap sizes respectively, e.g.

<property>  
   <name>mapreduce.map.java.opts</name>  
   <value>-Xmx1638m</value>
</property>
<property>  
   <name>mapreduce.reduce.java.opts</name>  
   <value>-Xmx3278m</value>
</property>

Upvotes: 1

tworec
tworec

Reputation: 4747

You need to make adjustments to mapreduce.{map|reduce}.java.opts and also to mapreduce.{map|reduce}.memory.mb.

For example:

  hadoop jar <jarName> <fqcn> \
      -Dmapreduce.map.memory.mb=4096 \
      -Dmapreduce.map.java.opts=-Xmx3686m

here is good resource with answer to this question

Upvotes: 6

S.K. Venkat
S.K. Venkat

Reputation: 1807

Make sure the mapreduce.child.java.opts have sufficient memory required to run mapred job. Also ensure that mapreduce.task.io.sort.mb should be less than mapreduce.child.java.opts.

Example:

 mapreduce.child.java.opts=Xmx2048m

 mapreduce.task.io.sort.mb=100

Otherwise you'll hit the OOM issue even the HADOOP_CLIENT_OPTS in hadoop-env.sh have enough memory if configured.

Upvotes: 1

Adi Kish
Adi Kish

Reputation: 89

I ended up with a very similar issue last week. My input file that I was using had a big ass line in it which I could not view. That line was almost 95% of my file size(95% of 1gb! imagine that!). I would suggest you take a look at your input files first. You might be having a malformed input file that you want to look into. Try increasing heap space after you check the input file.

Upvotes: 0

Tudor
Tudor

Reputation: 62469

You can assign more memory by editing the conf/mapred-site.xml file and adding the property:

  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
  </property>

This will start the hadoop JVMs with more heap space.

Upvotes: 39

borice
borice

Reputation: 1049

On Ubuntu using DEB install (at least for Hadoop 1.2.1) there is a /etc/profile.d/hadoop-env.sh symlink created to /etc/hadoop/hadoop-env.sh which causes it to load every time you log in. In my experience this is not necessary as the /usr/bin/hadoop wrapper itself will eventually call it (through /usr/libexec/hadoop-config.sh). On my system I've removed the symlink and I no longer get weird issues when changing the value for -Xmx in HADOOP_CLIENT_OPTIONS (because every time that hadoop-env.sh script is run, the client options environment variable is updated, though keeping the old value)

Upvotes: 0

Satyajit Rai
Satyajit Rai

Reputation: 96

Exporting the variables by running the following command worked for me:

. conf/hadoop-env.sh

Upvotes: 0

Mitra Bhanu
Mitra Bhanu

Reputation: 41

We faced the same situation.

Modifying the hadoop-env.sh worked out for me.

EXPORT HADOOP_HEAPSIZE would be commented, uncomment that & provide the size of your choice.

By default HEAPSIZE assigned is 1000MB.

Upvotes: 4

Odysseus
Odysseus

Reputation: 1092

The same exception with Ubuntu, Hadoop 1.1.1. The solution was simple - edit shell variable $HADOOP_CLIENT_OPTS set by some init script. But it took long time to find it =(

Upvotes: 2

etlolap
etlolap

Reputation: 531

After trying so many combinations, finally I concluded the same error on my environment (Ubuntu 12.04, Hadoop 1.0.4) is due to two issues.

  1. Same as Zach Gamer mentioned above.
  2. don't forget to execute "ssh localhost" first. Believe or not! No ssh would throw an error message on Java heap space as well.

Upvotes: 7

Brian C.
Brian C.

Reputation: 7986

I installed hadoop 1.0.4 from the binary tar and had the out of memory problem. I tried Tudor's, Zach Garner's, Nishant Nagwani's and Andris Birkmanis's solutions but none of them worked for me.

Editing the bin/hadoop to ignore $HADOOP_CLIENT_OPTS worked for me:

...
elif [ "$COMMAND" = "jar" ] ; then
     CLASS=org.apache.hadoop.util.RunJar
    #Line changed this line to avoid out of memory error:
    #HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
    # changed to:
     HADOOP_OPTS="$HADOOP_OPTS "
...

I'm assuming that there is a better way to do this but I could not find it.

Upvotes: 2

Zach Garner
Zach Garner

Reputation: 2049

For anyone using RPM or DEB packages, the documentation and common advice is misleading. These packages install hadoop configuration files into /etc/hadoop. These will take priority over other settings.

The /etc/hadoop/hadoop-env.sh sets the maximum java heap memory for Hadoop, by Default it is:

   export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"

This Xmx setting is too low, simply change it to this and rerun

   export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"

Upvotes: 80

wufawei
wufawei

Reputation: 51

You can solve this problem by editting the file /etc/hadoop/hadoop-env.sh.

Hadoop was giving /etc/hadoop config directory precedence over conf directory.

I also met with the same situation.

Upvotes: 4

Andris Birkmanis
Andris Birkmanis

Reputation: 403

Another possibility is editing hadoop-env.sh, which contains export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS". Changing 128m to 1024m helped in my case (Hadoop 1.0.0.1 on Debian).

Upvotes: 12

Nishant Nagwani
Nishant Nagwani

Reputation: 1270

Run your job like the one below:

bin/hadoop jar hadoop-examples-*.jar grep -D mapred.child.java.opts=-Xmx1024M input output 'dfs[a-z.]+' 

The heap space, by default is set to 32MB or 64MB. You can increase the heap space in properties file as, Tudor pointed out, or you can change it for this particular job by setting this property for this particular job.

Upvotes: 2

Related Questions