Reputation: 811
I have a file of 1GB size to be stored on HDFS file system. I am having a cluster setup of 10 data nodes and a namenode. Is there any calculation that the Namenode uses (not for replicas) a particular no of data nodes for the storage of the file? Or Is there any parameter that we can configure to use for a file storage? If so, what is the default no of datanodes that Hadoop uses to store the file if it is not specifically configured?
I want to know if it uses all the datanodes of the cluster or only specific no of datanodes.
Let's consider the HDFS block size is 64MB and free space is also existing on all the datanodes.
Thanks in advance.
Upvotes: 1
Views: 4832
Reputation: 1288
In Second(2nd) Generation of Hadoop
If the configured block size is 128 MB, and you have a 1 GB file which means the file size is 1024 MB.
So the blocks needed will be 1024/128 = 8 blocks, which means 1 Datanode will contain 8 blocks to store your 1 GB file.
Now, let's say that you have a 10 nodes cluster then the default replica is 3, that means your 1 GB file will be stored on 3 different nodes. So, the blocks acquired by your
1 GB file is -> *8 * 3 =24 blocks*.
If your one block is of 128 MB, then total size your 1 GB file consumed is -
*128 * 24 = 3072 MB*.
Upvotes: 0
Reputation: 666
If the configured block size is 64 MB, and you have a 1 GB file which means the file size is 1024 MB.
So the blocks needed will be 1024/64 = 16 blocks, which means 1 Datanode will consume 16 blocks to store your 1 GB file.
Now, let's say that you have a 10 nodes cluster then the default replica is 3, that means your 1 GB file will be stored on 3 different nodes. So, the blocks acquired by your 1 GB file is -> *16 * 3 =48 blocks*.
If your one block is of 64 MB, then total size your 1 GB file consumed is -> *64 * 48 = 3072 MB*.
Hope that clears your doubt.
Upvotes: 1