Yu Gu
Yu Gu

Reputation: 2493

Is a tasktracker corresponding to a mapper or a reducer in hadoop?

I know that a mapper always performs couples of map operations and a reducer always performs couples of reduce operations. In another word, the mapping between mapper(reducer) and map(reduce) operation is one to many.
Now I have a question, is the mapping between tasktracker and mapper one-to-one or one-to-many?

Upvotes: 0

Views: 430

Answers (3)

prasanth
prasanth

Reputation: 41

In mapreduce - The number of mappers depends on the number of input splits.

Also there will be one task tracker per data node

In case,if there are multiple input splits inside a SINGLE NODE-the splits(as per the Data locality optimization) will be executed in the queue in the corresponding machine JVM(In default there are two JVM's per node to perform the operations).

Considering the above scenario- there will be one-to-many mapping of the Task Tracker with that of the MAPPER(s).

Upvotes: 0

Rijul
Rijul

Reputation: 1445

First of all i will explain to you exactly what a Task tracker is:

A TaskTracker is a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a JobTracker.

Every TaskTracker is configured with a set of slots, these indicate the number of tasks that it can accept. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.

The TaskTracker spawns a separate JVM processes to do the actual work; this is to ensure that process failure does not take down the task tracker. The TaskTracker monitors these spawned processes, capturing the output and exit codes. When the process finishes, successfully or not, the tracker notifies the JobTracker. The TaskTrackers also send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated.

and yes this leads us to a point that one task tracker do many operations with job tracker (actual jobs i.e, map reduce tasks) , so answer to your question would be

one (job tracker) to many (task tracker) relation

Upvotes: 2

BDBoss
BDBoss

Reputation: 120

The last line is not correct.

The correction is: there is one task tracker per DataNode in the cluster and there is only one job tracker per NameNode in the cluster, this is assuming you are on a MRV1 (non-YARN) Hadoop cluster (Hadoop 1.x).

Upvotes: 1

Related Questions