Reputation: 121
Recently I was trying to understand the working of Mumak (see, e.g., MAPREDUCE-728)
It basically takes a job trace and topology trace and simulates hadoop. I couldn't understand how it assigns splits across nodes. What does mumak mean by local map task and non-local task?
Upvotes: 1
Views: 1632
Reputation: 26902
In MapReduce there is the notion of "locality" which signifies how "far away" a task is running from the data it is working on. The best locality is running a task on a node that contains the data it needs. The second best locality is a node in the same rack as a node containing the data, etc...
Mumak has the ability to slow-down the tasks scheduled on non-local nodes by using the following settings in your configuration file:
<property>
<name>mumak.scale.racklocal</name>
<value>1.5</value>
<description>Scaling factor for task attempt runtime of rack-local over
node-local</description>
</property>
<property>
<name>mumak.scale.rackremote</name>
<value>1.8</value>
<description>Scaling factor for task attempt runtime of rack-remote over
node-local</description>
</property>
Upvotes: 1