Reputation: 8860
With regards to the HDFS, I read from their site under the Data Replication section (below link) that
http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication
'all blocks in a file except the last block are the same size'
Could you please let me know what is the reason the last block would not be of the same size?
Could it be that the total memory allocation may play a part over here?
However if the memory size is not an issue, would then still the last block would not of the same size as the rest of the blocks for a file?
And if yes, could you please elaborate a bit on this?
Any link to the JIRA for the development effort for this would be greatly appreciated.
Upvotes: 2
Views: 147
Reputation: 3173
Actually this is not at all an issue. Indeed it is uncertain that the last block of the file can be in the same size.
Consider a file of size 1000 MB and the block is 128MB, then the file will be splitted into 8 Blocks, where the first 7 blocks will be in even size which is equal to 128MB.
The total size of the 7 blocks will be 896MB (7*128MB), hence remaining size will be 104MB (1000-896). So the last block's actual size will be 104 MB wherein other 7 blocks are of 128 MB.
The namenode will allocate data blocks for every chunk of the file being stored on HDFS. It will not make any consideration for the chunks which's size is less than the data block size.
HDFS is designed to store chunks of data in equally sized data blocks so that the data blocks available on data nodes can be easily calculated and maintained by the namenode.
Upvotes: 1