Reputation: 25418
I am running a task in pseudo-distributed mode on my 4 core laptop. How can I ensure that all cores are effectively used. Currently my job tracker shows that only one job is executing at a time. Does that mean only one core is used?
The following are my configuration files.
conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
EDIT: As per the answer, I need to add the following properties in mapred-site.xml
<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
</property>
Upvotes: 8
Views: 2856
Reputation: 33495
mapreduce.tasktracker.map.tasks.maximum
and mapreduce.tasktracker.reduce.tasks.maximum
properties control the number of map and reduce tasks per node. For a 4 core processor, start with 2/2 and from there change the values if required. A slot is a map or a reduce slot, setting the values to 4/4 will make the Hadoop framework launch 4 map and 4 reduce tasks simultaneously. A total of 8 map and reduce tasks run at a time on a node.
mapred.map.tasks
and mapred.reduce.tasks
properties control the total number of map/reduce tasks for the job and not the # of tasks per node. Also, mapred.map.tasks
is a hint to the Hadoop framework and the total # of map tasks for the job equals the # of InputSplits.
Upvotes: 6
Reputation: 66886
mapred.map.tasks
and mapred.reduce.tasks
will control this, and (I believe) would be set in mapred-site.xml
. However this establishes these as cluster-wide defaults; more usually you would configure these on a per-job basis. You can set the same params on the java command line with -D
Upvotes: 3