Reputation: 373
Assuming default Hadoop settings,if I am writing a file of 128 MB size into HDFS. There would be 2 blocks that the client needs to write to HDFS. So my questions around this is
Second scenario with 64 MB file size,
2. Can some one read the block which is currently written to the HDFS? or someone has to wait for the write to complete.
Upvotes: 2
Views: 730
Reputation: 21
In general when you write to HDFS, once more than a block's worth data has been written, the first block will be visible to the new readers.This is true for subsequent blocks also.It is always current block being written, that is not visible to other readers. However you can use FSDataOutputStream.sync() which forces all buffers to be synchronized to the data nodes.After sync() returns success, data written upto that point is guaranteed to visible to all new readers.
Upvotes: 0
Reputation: 1574
HDFS thinks in terms of blocks .
So , if your file is made of 2 blocks, and your one block is written, you can read that block . But, since its just a block of the file and not the whole file, you will have to search it in the dfs.data.dir
and use hadoop dfs -text
to read it. Or you could go to namenode UI
to read it.
for second question - No, you cant read the block that is currently being written . It wont be visible to the readers.
Upvotes: 1