Can 1 Tasktracker run multiple JVMs

Question

Can 1 Tasktracker run multiple JVMs?

Here is the scenario:

Assume there are 2 files (A & B) and 2 Data nodes (D1 & D2).

When you load A, assume it is getting split into A1 & A2 on D1 & D2 and when you load B, assume it is getting split into B1 & B2 on D1 & D2.

For some reason let us assume D1 is busy with some other tasks and D2 is available and there are a couple of jobs which are submitted, one using file A and the other one usign File B.

So now D2 is available and has blocks A2 & B2. Will the JobTracker submit the code to TaskTracker on D2 and run the task for A2 and B2 at a time or will it first run A2 and after it finishes it will run B2?

If so, again is it possible to run both the tasks in parallel which means 1 TaskTracker and 2 jvms, or will it create/spawn 2 TaskTrackers on D2?

Praveen Sripati · Accepted Answer

A task tracker (TT) can launch multiple map or reduce tasks in parallel on a single machine. By default TT launches 2 maps (mapreduce.tasktracker.map.tasks.maximum) and 2 reduce (mapreduce.tasktracker.reduce.tasks.maximum) tasks. The properties have to be configured in the mapred-default.xml.

Can 1 Tasktracker run multiple JVMs

Answers (2)

Related Questions