Reputation: 8374
I'm in the process of moving our application from Hadoop 1.0.3 to 2.7, on EMR v5.1.0. I got it running, but I'm still having problems getting my head around the resource-allocation system in Yarn. With the default settings provided by EMR, Hadoop only allocates one container per node, even if I select a larger instance type for the nodes. This is a problem, since we'll now be using twice as many nodes to do the same amount of work.
I want to squeeze more containers into one node, and ensure that we're using all the available resources. I assume that I shouldn't touch yarn.nodemanager.resource.memory-mb
or yarn.nodemanager.resource.cpu-vcores
, since those are set by EMR to reflect the actual available resources. Which settings do I have to change?
Upvotes: 1
Views: 1640
Reputation: 120
Your container sizes are defined by setting the memory (default criteria for a container) and vcores. The following can be configured:
yarn-scheduler.increment-allocation-mb
yarn-scheduler.minimum-allocation-vcores
All the following criteria must be satified (they are per container, except for yarn.nodemanager.resource.cpu-vcores and yarn.nodemanager.resource.memory-mb which are per NodeManager hence per DataNode):
1 <= yarn-scheduler.minimum-allocation-vcores <= yarn-scheduler.maximum-allocation-vcores
yarn-scheduler.maximum-allocation-vcores <= yarn.nodemanager.resource.cpu-vcores
yarn-scheduler.increment-allocation-vcores = 1
1024 <= yarn-scheduler.minimum-allocation-mb <= yarn-scheduler.maximum-allocation-mb
yarn-scheduler.maximum-allocation-mb <= yarn.nodemanager.resource.memory-mb
yarn-scheduler.increment-allocation-mb = 512
You can also see this helpful link https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_yarn_tuning.html
Upvotes: 1