Praveen Gr
Praveen Gr

Reputation: 197

In Hadoop, What is the relationship between replication factor and number of nodes in cluster?

For example if the replication factor is 3 and there are 2 nodes in cluster. Then how many replicas will be created ? How will they be placed ?

Upvotes: 1

Views: 1173

Answers (1)

PradeepKumbhar
PradeepKumbhar

Reputation: 3421

Having replication factor greater than the available datanodes defeats the purpose of replication. The replicas should be distinctly & uniquely placed on the datanodes. If one datanode contains more than one replicas (theoretically) of the same block, it does not provide additional fault tolerance because if that node goes down, both the replicas are lost. So having only one replica per node is enough.

And to answer your questions:

  1. What is the relationship between replication factor and number of datanodes in cluster? Ans. Maximum replication factor should be less than or equal to #datanodes

  2. If the replication factor is 3 and there are 2 nodes in cluster. Then how many replicas will be created?
    Ans. As far as I tried, only 2 replicas are created. (Try using hdfs dfs -setrep option)

  3. How will they be placed? Ans. They will be placed one per datanode.

Hence when you provide replication factor more than #datanodes, the extra replicas you are trying to create will be mentioned as Missing replicas in the hdfs fsck output. Also, the corresponding blocks will be treated as Under-Replicated Blocks

Upvotes: 4

Related Questions