Reputation: 9632
I tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document. When I tried executing this
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
I am getting the following Exception
java.lang.OutOfMemoryError: Java heap space
Please suggest a solution so that i can try out the example. The entire Exception is listed below. I am new to Hadoop I might have done something dumb . Any suggestion will be highly appreciated.
anuj@anuj-VPCEA13EN:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/12/11 17:38:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/11 17:38:22 INFO mapred.FileInputFormat: Total input paths to process : 7
11/12/11 17:38:22 INFO mapred.JobClient: Running job: job_local_0001
11/12/11 17:38:22 INFO util.ProcessTree: setsid exited with exit code 0
11/12/11 17:38:22 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e49dcd
11/12/11 17:38:22 INFO mapred.MapTask: numReduceTasks: 1
11/12/11 17:38:22 INFO mapred.MapTask: io.sort.mb = 100
11/12/11 17:38:22 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
11/12/11 17:38:23 INFO mapred.JobClient: map 0% reduce 0%
11/12/11 17:38:23 INFO mapred.JobClient: Job complete: job_local_0001
11/12/11 17:38:23 INFO mapred.JobClient: Counters: 0
11/12/11 17:38:23 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1257)
at org.apache.hadoop.examples.Grep.run(Grep.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Grep.main(Grep.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Upvotes: 60
Views: 101741
Reputation: 5465
If you are using Hadoop on Amazon EMR, a configuration can be added to increase the heap size:
[
{
"Classification": "hadoop-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"HADOOP_HEAPSIZE": "2048"
},
"Configurations": []
}
]
}
]
Upvotes: 0
Reputation: 151
Configure the JVM heap size for your map and reduce processes. These sizes need to be less than the physical memory you configured in the previous section. As a general rule, they should be 80% the size of the YARN physical memory settings.
Configure mapreduce.map.java.opts
and mapreduce.reduce.java.opts
to set the map and reduce heap sizes respectively, e.g.
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1638m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3278m</value>
</property>
Upvotes: 1
Reputation: 4747
You need to make adjustments to mapreduce.{map|reduce}.java.opts
and also to mapreduce.{map|reduce}.memory.mb
.
For example:
hadoop jar <jarName> <fqcn> \
-Dmapreduce.map.memory.mb=4096 \
-Dmapreduce.map.java.opts=-Xmx3686m
here is good resource with answer to this question
Upvotes: 6
Reputation: 1807
Make sure the mapreduce.child.java.opts
have sufficient memory required to run mapred job. Also ensure that mapreduce.task.io.sort.mb
should be less than mapreduce.child.java.opts
.
Example:
mapreduce.child.java.opts=Xmx2048m
mapreduce.task.io.sort.mb=100
Otherwise you'll hit the OOM issue even the HADOOP_CLIENT_OPTS in hadoop-env.sh have enough memory if configured.
Upvotes: 1
Reputation: 89
I ended up with a very similar issue last week. My input file that I was using had a big ass line in it which I could not view. That line was almost 95% of my file size(95% of 1gb! imagine that!). I would suggest you take a look at your input files first. You might be having a malformed input file that you want to look into. Try increasing heap space after you check the input file.
Upvotes: 0
Reputation: 62469
You can assign more memory by editing the conf/mapred-site.xml file and adding the property:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
This will start the hadoop JVMs with more heap space.
Upvotes: 39
Reputation: 1049
On Ubuntu using DEB install (at least for Hadoop 1.2.1) there is a /etc/profile.d/hadoop-env.sh
symlink created to /etc/hadoop/hadoop-env.sh
which causes it to load every time you log in. In my experience this is not necessary as the /usr/bin/hadoop
wrapper itself will eventually call it (through /usr/libexec/hadoop-config.sh
). On my system I've removed the symlink and I no longer get weird issues when changing the value for -Xmx
in HADOOP_CLIENT_OPTIONS
(because every time that hadoop-env.sh
script is run, the client options environment variable is updated, though keeping the old value)
Upvotes: 0
Reputation: 96
Exporting the variables by running the following command worked for me:
. conf/hadoop-env.sh
Upvotes: 0
Reputation: 41
We faced the same situation.
Modifying the hadoop-env.sh
worked out for me.
EXPORT HADOOP_HEAPSIZE
would be commented, uncomment that & provide the size of your choice.
By default HEAPSIZE
assigned is 1000MB.
Upvotes: 4
Reputation: 1092
The same exception with Ubuntu, Hadoop 1.1.1. The solution was simple - edit shell variable $HADOOP_CLIENT_OPTS set by some init script. But it took long time to find it =(
Upvotes: 2
Reputation: 531
After trying so many combinations, finally I concluded the same error on my environment (Ubuntu 12.04, Hadoop 1.0.4) is due to two issues.
Upvotes: 7
Reputation: 7986
I installed hadoop 1.0.4 from the binary tar and had the out of memory problem. I tried Tudor's, Zach Garner's, Nishant Nagwani's and Andris Birkmanis's solutions but none of them worked for me.
Editing the bin/hadoop to ignore $HADOOP_CLIENT_OPTS worked for me:
...
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
#Line changed this line to avoid out of memory error:
#HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
# changed to:
HADOOP_OPTS="$HADOOP_OPTS "
...
I'm assuming that there is a better way to do this but I could not find it.
Upvotes: 2
Reputation: 2049
For anyone using RPM or DEB packages, the documentation and common advice is misleading. These packages install hadoop configuration files into /etc/hadoop. These will take priority over other settings.
The /etc/hadoop/hadoop-env.sh sets the maximum java heap memory for Hadoop, by Default it is:
export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"
This Xmx setting is too low, simply change it to this and rerun
export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"
Upvotes: 80
Reputation: 51
You can solve this problem by editting the file /etc/hadoop/hadoop-env.sh
.
Hadoop was giving /etc/hadoop config directory precedence over conf directory.
I also met with the same situation.
Upvotes: 4
Reputation: 403
Another possibility is editing hadoop-env.sh
, which contains export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"
.
Changing 128m to 1024m helped in my case (Hadoop 1.0.0.1 on Debian).
Upvotes: 12
Reputation: 1270
Run your job like the one below:
bin/hadoop jar hadoop-examples-*.jar grep -D mapred.child.java.opts=-Xmx1024M input output 'dfs[a-z.]+'
The heap space, by default is set to 32MB or 64MB. You can increase the heap space in properties file as, Tudor pointed out, or you can change it for this particular job by setting this property for this particular job.
Upvotes: 2