Vijay Kumar
Vijay Kumar

Reputation: 1

how many times input splits are replicated in hdfs

each Input split is replicated in 3 times in hadoop cluster. for each replicate split , does hadoop assigns each map? . If then assign which map results send to reduce function. does hadoop replicates the reduce function also

Upvotes: 0

Views: 243

Answers (1)

sun_dare
sun_dare

Reputation: 1166

No, Even though there are three replicas for a split, only one mapper will be assigned by MapReduce engine. It uses the concept called data localization in order to decide which replica of the split to use.

Hadoop does its best to run the map task on a node where the input data resides in HDFS. This is called the data locality optimization since it doesn’t use valuable cluster bandwidth. Sometimes, however, all three nodes hosting the HDFS block replicas for a map task’s input split are running other map tasks so the job scheduler will look for a free map slot on a node in the same rack as one of the blocks. Very occasionally even this is not possible, so an off-rack node is used, which results in an inter-rack network transfer.

Please find below the excerpts from the Hadoop Definitive guide.

Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop creates one map task for each split, which runs the user- defined map function for each record in the split.

Upvotes: 1

Related Questions