Reputation: 91959
I have a hadoop cluster of three machines
where one machine acts as both master and slave.
When I run wordcount-example, it run map tasks on two machines - worker1
and worker2
. But when I run my own code, it runs only on one machine - worker1
, how can I make map tasks run on all machines?
Input Split Locations
/default-rack/master
/default-rack/worker1
/default-rack/worker2
FIXED!!!
I added the following in my configuration of mapred-site.xml
and it fixed it
<property>
<name>mapred.map.tasks</name>
<value>100</value>
</property>
Upvotes: 2
Views: 1287
Reputation: 39893
How big is your input? Hadoop splits up the jobs into input splits, and if your file is too small, it will only have one split.
Try a larger file-- say around 1GB in size and see how mappers you get then.
You can also check to make sure that every TaskTracker is reporting properly to the JobTracker. If there is a TaskTracker that is not properly connected, it will not get tasks:
$ hadoop job -list-active-trackers
This command should output all 3 of your hosts.
Upvotes: 1