Reputation: 1448
In Hadoop v1, I have assigned each 7 mapper and reducer slot with size of 1GB, my mappers & reducers runs fine. My machine has 8G memory, 8 processor. Now with YARN, when run the same application on the same machine, I got container error. By default, I have this settings:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
It gave me error:
Container [pid=28920,containerID=container_1389136889967_0001_01_000121] is running beyond virtual memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
I then tried to set memory limit in mapred-site.xml:
<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
But still getting error:
Container [pid=26783,containerID=container_1389136889967_0009_01_000002] is running beyond physical memory limits. Current usage: 4.2 GB of 4 GB physical memory used; 5.2 GB of 8.4 GB virtual memory used. Killing container.
I'm confused why the the map task need this much memory. In my understanding, 1GB of memory is enough for my map/reduce task. Why as I assign more memory to container, the task use more? Is it because each task gets more splits? I feel it's more efficient to decrease the size of container a little bit and create more containers, so that more tasks are running in parallel. The problem is how can I make sure each container won't be assigned more splits than it can handle?
Upvotes: 96
Views: 160358
Reputation: 139
I am practicing Hadoop programs (version hadoop3). Via virtual box I have installed Linux OS. We allocate very limited memory at time of installation of Linux.
By setting the following memory limit properties in mapred-site.xml
and restarting your HDFS and YARN then my program worked.
<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
Upvotes: 1
Reputation: 3705
I haven't personally checked, but hadoop-yarn-container-virtual-memory-understanding-and-solving-container-is-running-beyond-virtual-memory-limits-errors sounds very reasonable
I solved the issue by changing yarn.nodemanager.vmem-pmem-ratio
to a higher value , and I would agree that:
Another less recommended solution is to disable the virtual memory check by setting yarn.nodemanager.vmem-check-enabled to false.
Upvotes: 0
Reputation: 1057
Running yarn on Windows Linux subsystem with Ubunto OS, error "running beyond virtual memory limits, Killing container" I resolved it by disabling virtual memory check in the file yarn-site.xml
<property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property>
Upvotes: 1
Reputation: 4575
You should also properly configure the maximum memory allocations for MapReduce. From this HortonWorks tutorial:
[...]
Each machine in our cluster has 48 GB of RAM. Some of this RAM should be >reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for >YARN to use and keep 8 GB for the Operating System
For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.
In mapred-site.xml:
mapreduce.map.memory.mb
: 4096
mapreduce.reduce.memory.mb
: 8192Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.
In mapred-site.xml:
mapreduce.map.java.opts
:-Xmx3072m
mapreduce.reduce.java.opts
:-Xmx6144m
The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use.
To sum it up:
mapreduce
configs, not the mapred
ones. EDIT: This comment is not applicable anymore now that you've edited your question.java.opts
settings listed above.Finally, you may want to check this other SO question that describes a similar problem (and solution).
Upvotes: 109
Reputation: 1815
There is a check placed at Yarn level for Virtual and Physical memory usage ratio. Issue is not only that VM doesn't have sufficient physical memory. But it is because Virtual memory usage is more than expected for given physical memory.
Note : This is happening on Centos/RHEL 6 due to its aggressive allocation of virtual memory.
It can be resolved either by :
Disable virtual memory usage check by setting yarn.nodemanager.vmem-check-enabled to false;
Increase VM:PM ratio by setting yarn.nodemanager.vmem-pmem-ratio to some higher value.
References :
https://issues.apache.org/jira/browse/HADOOP-11364
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
Add following property in yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
Upvotes: 51
Reputation: 459
We also faced this issue recently. If the issue is related to mapper memory, couple of things I would like to suggest that needs to be checked are.
Upvotes: 1
Reputation: 93
While working with spark in EMR I was having the same problem and setting maximizeResourceAllocation=true
did the trick; hope it helps someone. You have to set it when you create the cluster. From the EMR docs:
aws emr create-cluster --release-label emr-5.4.0 --applications Name=Spark \
--instance-type m3.xlarge --instance-count 2 --service-role EMR_DefaultRole --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole --configurations https://s3.amazonaws.com/mybucket/myfolder/myConfig.json
Where myConfig.json should say:
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
Upvotes: 2
Reputation: 902
I had a really similar issue using HIVE in EMR. None of the extant solutions worked for me -- ie, none of the mapreduce configurations worked for me; and neither did setting yarn.nodemanager.vmem-check-enabled
to false.
However, what ended up working was setting tez.am.resource.memory.mb
, for example:
hive -hiveconf tez.am.resource.memory.mb=4096
Another setting to consider tweaking is yarn.app.mapreduce.am.resource.mb
Upvotes: 15
Reputation: 139
I can't comment on the accepted answer, due to low reputation. However, I would like to add, this behavior is by design. The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.
Upvotes: 8