Reputation: 372
I am working with Hadoop HDFS for quite some time and I am aware of the working of the HDFS blocks(64 Mb, 128 Mb).but still I am not clear with the blocks in the other file systems for example Hard disk drive has the storage block capacity of 4kb.
So my understanding is that all the storage systems uses blocks for storing the data even our mobile sd card,but Hadoop in order to handle massive volume of data has bigger block size, Is it correct?
Please let me know if there are any documentation comparing the different block storage systems.
Upvotes: 2
Views: 1151
Reputation: 161
HDFS is basically an abstraction over the existing file system (which means a 64 MB/ 128 MB block is stored as 4k blocks in LFS). The reason the size of hdfs block is large is to minimize seeks. A HDFS block is stored in contiguous memory location (next to one another) in the normal file system, which means the total time to read it is time to seek the head of the first block on LFS and then reading the contents of the block without doing any more seeks as they are contiguous.
This means that we are reading the data comparable to disk transfer rate with spending minimum time in seeking.
This is very helpful in MR jobs as we have to read a lot of data and perform operations on that data as well, so minimizing seek times gives good performance boost.
Also HDFS is meant to handle large files. Lets say you have a 1GB file. With a 4k block size, you'd have to make 256,000 requests to get that file. In HDFS, those requests go across a network to the Name Node to figure out where that block can be found. If you use 64Mb blocks, the number of requests goes down to 16. The reason for using large block size is to reduce the stress on namenode. As namenode stores the metadata for file blocks, if the block size is small then namenode will be very easily overwhelmed with the block data.
These link will also help you understand better HDFS vs LFS
Upvotes: 3