Reputation: 161
I am a relative newbie to hadoop and want to get a better understanding of how replication works in HDFS.
Say that I have a 10 node system(1 TB each node), giving me a total capacity of 10 TB. If I have a replication factor of 3, then I have 1 original copy and 3 replicas for each file. So, in essence, only 25% of my storage is original data. So my 10 TB cluster is in effect only 2.5 TB of original(un-replicated) data.
Please let me know if my train of thought is correct.
Upvotes: 2
Views: 1619
Reputation: 51309
Your thinking is a little off. A replication factor of 3 means that you have 3 total copies of your data. More specifically, there will be 3 copies of each block for your file, so if your file is made up of 10 blocks there will be 30 total blocks across your 10 nodes, or about 3 blocks per node.
You are correct in thinking that a 10x1TB cluster has less than 10TB capacity- with a replication factor of 3, it actually has a functional capacity of about 3.3TB, with a little less actual capacity because of space needed for doing any processing, holding temporary files, etc.
Upvotes: 5