hadoop cluster: map task run only on one machine and not all

Question

I have a hadoop cluster of three machines where one machine acts as both master and slave.

When I run wordcount-example, it run map tasks on two machines - worker1 and worker2. But when I run my own code, it runs only on one machine - worker1, how can I make map tasks run on all machines?

Input Split Locations

/default-rack/master
/default-rack/worker1
/default-rack/worker2

FIXED!!!

I added the following in my configuration of mapred-site.xml and it fixed it


  mapred.map.tasks
  100

Donald Miner · Accepted Answer

How big is your input? Hadoop splits up the jobs into input splits, and if your file is too small, it will only have one split.

Try a larger file-- say around 1GB in size and see how mappers you get then.

You can also check to make sure that every TaskTracker is reporting properly to the JobTracker. If there is a TaskTracker that is not properly connected, it will not get tasks:

   $ hadoop job -list-active-trackers

This command should output all 3 of your hosts.

hadoop cluster: map task run only on one machine and not all

Answers (1)

Related Questions