Load balancing in Hadoop

Question

How does load balancing in the hadoop environment. I have just started reading about the hadoop related stuffs. Would like to know how does the load balancing work in the entire ecosystem

Tanveer Dayan · Accepted Answer

Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits , or just splits . Hadoop creates one map task for each split, which runs the user- defined map function for each record in the split. Having many splits means the time taken to process each split is small compared to the time to process the whole input. So if we are processing the splits in parallel, the pro- cessing is better load-balanced if the splits are small, since a faster machine will be able to process proportionally more splits over the course of the job than a slower machine. Even if the machines are identical, failed processes or other jobs running concurrently make load balancing desirable, and the quality of the load balancing increases as the splits become more fine-grained. On the other hand, if splits are too small, then the overhead of managing the splits and of map task creation begins to dominate the total job execution time. For most jobs, a good split size tends to be the size of an HDFS block, 64 MB by default, although this can be changed for the cluster (for all newly created files), or specified when each file is created.

Load balancing in Hadoop

Answers (1)

Related Questions