Full utilization of all cores in Hadoop pseudo-distributed mode

Question

I am running a task in pseudo-distributed mode on my 4 core laptop. How can I ensure that all cores are effectively used. Currently my job tracker shows that only one job is executing at a time. Does that mean only one core is used?

The following are my configuration files.

conf/core-site.xml:


   
       fs.default.name
       hdfs://localhost:9000

conf/hdfs-site.xml:


  
       dfs.replication
       1

conf/mapred-site.xml:


   
        mapred.job.tracker
        localhost:9001

EDIT: As per the answer, I need to add the following properties in mapred-site.xml

 
     mapred.map.tasks 
     4 
  
  
     mapred.reduce.tasks 
     4

Sean Owen · Accepted Answer

mapred.map.tasks and mapred.reduce.tasks will control this, and (I believe) would be set in mapred-site.xml. However this establishes these as cluster-wide defaults; more usually you would configure these on a per-job basis. You can set the same params on the java command line with -D

Full utilization of all cores in Hadoop pseudo-distributed mode

Answers (2)

Related Questions