Reputation: 105
I am new to Hadoop and I want to understand how do we determine the highest replication factor we can have for any given cluster. I know that the default setting is 3 replicas, but if I have a cluster with 5 node what is the highest replication factor that I can user in that case. Is there a formula that we have to follow to determine the replication factor?
Thank you
Upvotes: 2
Views: 1253
Reputation: 11
In the Hadoop environment, the default replication factor is 3 for 3 slave machines or more than that. Here is a simple formula for that is 'N' Replication Factor = 'N' Slave Nodes. Here is more info about replication http://commandstech.com/replication-factor-in-hadoop/
Upvotes: 0
Reputation: 4575
The highest replication factor that you can use is a function of the number of nodes in your cluster (as @Tarik said, you cannot have more replicas than nodes in your cluster), your expected usage (how much data do you plan to store) AND your cluster's storage capacity.
This other SO question has some calculations on capacity and storage use.
Upvotes: 1
Reputation: 11209
Obviously you cannot have more replicas than nodes as storing two copies on the same node is useless. It seems to me to be the upper limit.
Upvotes: 0