nj2012
nj2012

Reputation: 105

Replication factor

I am new to Hadoop and I want to understand how do we determine the highest replication factor we can have for any given cluster. I know that the default setting is 3 replicas, but if I have a cluster with 5 node what is the highest replication factor that I can user in that case. Is there a formula that we have to follow to determine the replication factor?

Thank you

Upvotes: 2

Views: 1253

Answers (3)

ss sreekanth
ss sreekanth

Reputation: 11

In the Hadoop environment, the default replication factor is 3 for 3 slave machines or more than that. Here is a simple formula for that is 'N' Replication Factor = 'N' Slave Nodes. Here is more info about replication http://commandstech.com/replication-factor-in-hadoop/

Upvotes: 0

cabad
cabad

Reputation: 4575

The highest replication factor that you can use is a function of the number of nodes in your cluster (as @Tarik said, you cannot have more replicas than nodes in your cluster), your expected usage (how much data do you plan to store) AND your cluster's storage capacity.

This other SO question has some calculations on capacity and storage use.

Upvotes: 1

Tarik
Tarik

Reputation: 11209

Obviously you cannot have more replicas than nodes as storing two copies on the same node is useless. It seems to me to be the upper limit.

Upvotes: 0

Related Questions