Reputation: 5351
I have an Hadoop 2.2 cluster deployed on a small number of powerful machines. I have a constraint to use YARN as the framework, which I am not very familiar with.
Thanks in advance for helping me melt these machines :)
Upvotes: 5
Views: 5910
Reputation: 31
I've the same problem, in order to increase the number of mappers, it's recommended to reduce the size of the input split (each input split is processed by a mapper and so a container). I don't know how to do it,
indeed, hadoop 2.2 /yarn does not take into account none of the following settings
<property>
<name>mapreduce.input.fileinputformat.split.minsize</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.split.maxsize</name>
<value>16777216</value>
</property>
<property>
<name>mapred.min.split.size</name>
<value>1</value>
</property>
<property>
<name>mapred.max.split.size</name>
<value>16777216</value>
</property>
best
Upvotes: 2
Reputation: 8705
1.
In MR1, the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties dictated how many map and reduce slots each TaskTracker had.
These properties no longer exist in YARN. Instead, YARN uses yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, which control the amount of memory and CPU on each node, both available to both maps and reduces
Essentially:
YARN has no TaskTrackers, but just generic NodeManagers. Hence, there's no more Map slots and Reduce slots separation. Everything depends on the amount of memory in use/demanded
2.
Using the web UI you can get lot of monitoring/admin kind of info:
NameNode - http://:50070/
Resource Manager - http://:8088/
In addition Apache Ambari is meant for this: http://ambari.apache.org/
And Hue for interfacing with the Hadoop/YARN cluster in many ways: http://gethue.com/
Upvotes: 4
Reputation: 2345
Upvotes: 3