gameover
gameover

Reputation: 12013

Does Hadoop not show incomplete files?

I'm using the command fs -put to copy a huge 100GB file into HDFS. My HDFS block size is 128MB. The file copy takes a long time. My question is while the file copy is in progress, the other users are not able to see the file. Is this by design? How can we enable access to this partial file by another user so that he too can monitor the copy progress.

Upvotes: 3

Views: 1264

Answers (2)

Praveen Sripati
Praveen Sripati

Reputation: 33495

According to the Hadoop - The Definitive Guide

Once more than a block’s worth of data has been written, the first block will be visible to new readers. This is true of subsequent blocks, too: it is always the current block being written that is not visible to other readers.

Upvotes: 0

Hari Menon
Hari Menon

Reputation: 35405

The size is shown block by block. So if your bloack size is 128MB, then you'll see the file size as 128MB when the first block is done, then after some time you'll see the size as 256MB and so on until the entire file is copied. So you can use the regular HDFS UI or command line hadoop fs -ls to monitor block-by-block copy progress. You can also read the part that is already copied using hadoop fs -cat even while the copy is in progress.

Upvotes: 1

Related Questions