sgsi
sgsi

Reputation: 392

How data in an HDFS block is stored?

I was reading about HDFS and was wondering, if there is any specific format in which data in a block is arranged.

Suppose there is a file of 265 MB that is copied to a Hadoop cluster and the HDFS block size is 64 MB. So the file is broken into 5 parts- 64 MB + 64 MB + 64 MB + 64 MB + 9 MB, and distributed among data nodes. Correct ?

  1. I have a doubt that is there any format within the 64 MB block in which data is stored ?
  2. If there is any format/structure in which the data is stored within the block, then the stored data should be less than 64 MB, since the data structure/header etc, itself may take some space.
  3. Since HDFS data node is a logical filesystem (It runs on top of linux and there is no separate partition for HDFS), all the blocks should be stored as files in the linux partition. Correct ?
  4. How to know the name of the file on linux that actually stores the 64 MB HDFS block ?

Anyone, if can answer these doubts/questions, that would be great. Thanks in advance.

Regards,

(*Vipul)() ;

Upvotes: 3

Views: 2389

Answers (1)

0x0FFF
0x0FFF

Reputation: 5018

  1. No, the data is just split on 64MB boundary. Metadata is stored in a small separate file and on the Namenode
  2. No, it is exactly the size you specified, and the data is split on exact boundaries of 64MB. If you have 5 parts - 64 MB + 64 MB + 64 MB + 64 MB + 9 MB, then the last file would be 9MB, all the others are 64MB
  3. Yes, the blocks are stored as a files, each block is represented as a separate file with some small amount of metadata stored in a separate file
  4. hdfs fsck / -files -blocks -locations

Here's an example of how the block files are stored with 128MB block size:

-rw-r--r--. 1 hdfs hadoop 134217728 Jan 12 09:17 blk_1073741825
-rw-r--r--. 1 hdfs hadoop   1048583 Jan 12 09:17 blk_1073741825_1001.meta
-rw-r--r--. 1 hdfs hadoop 134217728 Jan 12 09:18 blk_1073741826
-rw-r--r--. 1 hdfs hadoop   1048583 Jan 12 09:18 blk_1073741826_1002.meta
-rw-r--r--. 1 hdfs hadoop 134217728 Jan 12 09:18 blk_1073741827
-rw-r--r--. 1 hdfs hadoop   1048583 Jan 12 09:18 blk_1073741827_1003.meta
-rw-r--r--. 1 hdfs hadoop 134217728 Jan 12 09:18 blk_1073741828
-rw-r--r--. 1 hdfs hadoop   1048583 Jan 12 09:18 blk_1073741828_1004.meta
-rw-r--r--. 1 hdfs hadoop 134217728 Jan 12 09:19 blk_1073741829
-rw-r--r--. 1 hdfs hadoop   1048583 Jan 12 09:19 blk_1073741829_1005.meta
-rw-r--r--. 1 hdfs hadoop 134217728 Jan 12 09:19 blk_1073741830
-rw-r--r--. 1 hdfs hadoop   1048583 Jan 12 09:19 blk_1073741830_1006.meta
-rw-r--r--. 1 hdfs hadoop  87776064 Jan 12 09:19 blk_1073741831
-rw-r--r--. 1 hdfs hadoop    685759 Jan 12 09:19 blk_1073741831_1007.meta

Upvotes: 6

Related Questions