Reputation: 12013
I'm using the command fs -put
to copy a huge 100GB file into HDFS. My HDFS block size is 128MB. The file copy takes a long time. My question is while the file copy is in progress, the other users are not able to see the file. Is this by design? How can we enable access to this partial file by another user so that he too can monitor the copy progress.
Upvotes: 3
Views: 1264
Reputation: 33495
According to the Hadoop - The Definitive Guide
Once more than a block’s worth of data has been written, the first block will be visible to new readers. This is true of subsequent blocks, too: it is always the current block being written that is not visible to other readers.
Upvotes: 0
Reputation: 35405
The size is shown block by block. So if your bloack size is 128MB, then you'll see the file size as 128MB when the first block is done, then after some time you'll see the size as 256MB and so on until the entire file is copied. So you can use the regular HDFS UI or command line hadoop fs -ls
to monitor block-by-block copy progress. You can also read the part that is already copied using hadoop fs -cat
even while the copy is in progress.
Upvotes: 1