Reputation: 139
Can 1 Tasktracker run multiple JVMs?
Here is the scenario:
Assume there are 2 files (A & B) and 2 Data nodes (D1 & D2).
When you load A, assume it is getting split into A1 & A2 on D1 & D2 and when you load B, assume it is getting split into B1 & B2 on D1 & D2.
For some reason let us assume D1 is busy with some other tasks and D2 is available and there are a couple of jobs which are submitted, one using file A and the other one usign File B.
So now D2 is available and has blocks A2 & B2. Will the JobTracker submit the code to TaskTracker on D2 and run the task for A2 and B2 at a time or will it first run A2 and after it finishes it will run B2?
If so, again is it possible to run both the tasks in parallel which means 1 TaskTracker and 2 jvms, or will it create/spawn 2 TaskTrackers on D2?
Upvotes: 1
Views: 242
Reputation: 33495
A task tracker (TT) can launch multiple map or reduce tasks in parallel on a single machine. By default TT launches 2 maps (mapreduce.tasktracker.map.tasks.maximum) and 2 reduce (mapreduce.tasktracker.reduce.tasks.maximum) tasks. The properties have to be configured in the mapred-default.xml.
Upvotes: 0
Reputation: 8705
By default Task Tracker spawns one JVM for each task. You can reuse jvms by setting this configuration parameter: mapred.job.reuse.jvm.num.tasks
Upvotes: 1