Reputation: 842
HDFS has the default block size as 60MB. So, does that mean the minimum size of a file in HDFS is 60MB?.
i.e. if we create/copy a file which is less than 60MB in size (say 5bytes) then my assumption is that the actual size if that file in HDFS is 1block i.e. 60MB. But, when I copy a 5bytes file to HDFS then when I see the size of the file (through ls
command) I still see the size of that file as 5bytes. Shouldn't that be 60MB?.
or is the ls command showing the size of the data in the file instead of the block size of the file on HDFS?
Upvotes: 0
Views: 1839
Reputation: 165
The default size of hdfs block does not means that it will use all the space whatever we have specified i.e. 60 MB. if data is more that 60 MB then it will split the data into the blocks (data/60 MB) , that number of blocks will be created. If you are doing the ls command then it will only show you currently using space.
ex:-- i have uploaded test.txt file in hdfs and block size i have set to 128 MB and replication is 2 but our actual file size is only 193 B.
**Permission Owner Group Size Last Modified Replication Block Size Name
-rw-r--r-- hduser supergroup 193 B 10/27/2016, 2:58:41 PM 2 128 MB test.txt**
Upvotes: 1
Reputation: 7462
The default block size is a maximum size of a block. Each file consists of blocks, which are distributed (and replicated) to different datanodes on HDFS. The namenode knows which blocks constitute a file, and where to find them. Perhaps it's easier to understand this with the following image:
If a file exceeds 60MB (120MB in the new version), it cannot be written using a single block, it will need at least two. Of course, if it is less than 60MB it can be written in a single block, which will occupy as much space, as necessary (less than 60MB).
After all, it doesn't make sense that a 5-byte file will occupy 60MB.
Upvotes: 0