Reputation: 785
In a related question (How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce), I ask for formulas relating the number of concurrently running mappers/reducers to YARN and MR2 memory parameters. It turns out that on Elastic MapReduce, when my cluster has between 2 and 10 c3.2xlarge nodes, variations of the formulas mentioned there work okay, giving me 7-9 concurrently running mappers per node; but when the number of c3.2xlarges is 20 or 40, I get cluster underutilization: only 1-4 mappers run per node. Since my job is CPU-bound, this is particularly awful: MR2 delivers _half_the performance of MR1 for me.
Why is this happening?
Upvotes: 1
Views: 143
Reputation: 1618
You will be limited from what the NameNode can dish out. You can and should specific a larger instance type for the NameNode when increase your Task nodes as such. The MR1 page was never updated for c3s http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration.html
Upvotes: 1