Reputation: 41
I'm using Elastic MapReduce (Hadoop 2.0 with YARN) on AWS.
The configuration is the following:
10 x g2.2xlarge core instances with 15GB of RAM and 8 CPU cores
yarn.nodemanager.vmem-check-enabled=false
yarn.scheduler.minimum-allocation-mb=2048
yarn.nodemanager.resource.memory-mb=12288
mapreduce.map.memory.mb=3072
When running a job the scheduler shows that only 81.7% of the cluster is allocated:
Used Capacity: 81.7% Absolute Used Capacity: 81.7% Absolute Capacity: 100.0% Absolute Max Capacity: 100.0% Used Resources: Num Schedulable Applications: 1 Num Non-Schedulable Applications: 0 Num Containers: 25 Max Applications: 10000 Max Applications Per User: 10000 Max Schedulable Applications: 6 Max Schedulable Applications Per User: 6 Configured Capacity: 100.0% Configured Max Capacity: 100.0% Configured Minimum User Limit Percent: 100% Configured User Limit Factor: 1.0 Active users: hadoop
The scheduler assigns max 3 containers per node and the total number of containers is capped at 25.
Why does it only allocate 25 containers?
From the memory settings I would expect to see
yarn.nodemanager.resource.memory-mb(12288) / mapreduce.map.memory.mb(3072) = 4 containers per node
Thanks
P.S. this looks like a similar questions but it's not answered How concurrent # mappers and # reducers are calculated in Hadoop 2 + YARN?
Upvotes: 2
Views: 1878
Reputation: 41
I got it working after going through this tutorial.
2 things were changed:
The final settings that worked for me were:
yarn.nodemanager.vmem-pmem-ratio=50
yarn.nodemanager.resource.memory-mb=12288
yarn.scheduler.minimum-allocation-mb=3057
yarn.app.mapreduce.am.resource.mb=6114
mapreduce.map.java.opts: -Xmx2751m
mapreduce.map.memory.mb: 3057
Now it fully allocates 4 containers per node.
Upvotes: 2