Reputation: 1
I am a newbee to the hadoop. I got confused about who does the splitting of input file. lets assume i have a 200 mb of file and the block size is 64 mb. so we need total of 4 blocks multiplied by the replication factor. who splits the file and how does the split files available to client to be able to write to datanodes.
If at all possible please provide me the links to this information? I tried googling and is unsuccessful in finding detailed step by step explanation of hadoop architecture. There are couple of sites but are missing the details.
Upvotes: 0
Views: 179
Reputation: 4575
Though some details have changed over the years, these two documents (written by people involved in the early development of HDFS) provide a very good description of how things work in HDFS:
To answer your specific question: The HDFS middleware (specifically, the HDFS client component) does the splitting of files into blocks prior to uploading, and joining of blocks when you download a file to a client. This is completely transparent to the user.
Upvotes: 1