Reputation: 11
As per my understanding of HDFS, HDFS is a higher level file system that abstracts the local file system with a huge block size (64 MB). When the client wants to write a file to HDFS, depending on the replication factor a pipeline will be formed.
Then HDFSClient will cache the file up to the block size (e.g. 64 MB) and streams the data in terms of 4 KB packets to the first DataNode in the pipeline followed by remaining data nodes. Since these blocks are normal files to the local file system on which HDFS is running.
I would like to know:
Upvotes: 1
Views: 89