user2426139
user2426139

Reputation: 53

Content of the fsimage hdfs

I have a question on what is the metadata in the fsimage all about. I read that All mutations to the file system namespace, such as file renames, permission changes, file creations, block allocations are inside the fsimage. But the block location data as well? Does it contain the information about where (on which datanode) the blocks are stores as well? I get from this source: http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/ that the metadata on where blocks is stored is build by the block repots of the datanodes. Is this true? So the Fsimage does not contain information about the block locations?

Upvotes: 2

Views: 3601

Answers (4)

spearkkk
spearkkk

Reputation: 21

First of all, fs_image is not same as the data which is stored in memory of Namenode.

  • fs_image: snapshot data for file`s metadata in HDFS
  • data in memory of Namenode: fs_image + blocks location(which is reported by Datanodes, block reports).

So, There is no blocks location in fs_image.


In HDFS, There is persistent data in Namenode: EditLog and fs_image.

  • EditLog is kind of transaction log for modification. It is meant that whole operations for data is recorded as editLog.
  • fs_image is snapshot data of metadata, like size of file, count of file, block count, etc.

These persistent data is for HA(Hight Availability).
When NN(Namenoe) is down because of any issue, the data in memory of NN is gone. It is kind critical problem in HDFS because we dont know about what file exist.
When NN is recovered, NN loads fs_image and apply edit log data to know what file exist in HDFS.

Ofc, fs_image is kind huge data when you have data a lot in HDFS. And there are many edit log data when you have change data a lot too. There is checkpoint process to merge between fs_image and edit logs. But still there is some crisis to manage huge fs_image.

Upvotes: 0

Jing Wang
Jing Wang

Reputation: 50

Hadoop provides a tool that converts the fsimage file into human readable formats. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html

Sample output:

bin/hdfs oiv -i fsimagedemo -p Indented -o fsimage.txt

   FSImage
     ImageVersion = -19
     NamespaceID = 2109123098
     GenerationStamp = 1003
     INodes [NumInodes = 12]
       Inode
         INodePath =
         Replication = 0
         ModificationTime = 2009-03-16 14:16
         AccessTime = 1969-12-31 16:00
         BlockSize = 0
         Blocks [NumBlocks = -1]
         NSQuota = 2147483647
         DSQuota = -1
         Permissions
           Username = theuser
           GroupName = supergroup
           PermString = rwxr-xr-x
   ...remaining output omitted...

Upvotes: 1

Abhijeet Apsunde
Abhijeet Apsunde

Reputation: 572

Namenode maintains two type of data

Block Location data : Since files are chopped into blocks, NN should know which piece is where. This data is kept in memory and never persisted on disk, DNs talk to NN periodically and share the blockreport.

file system (metadata) : such as the file system hierarchy, permissions, etc. This info is persisted to the disk

when namenodes starts up it loads "snapshot" of filesystem from fsimage and applies the edit logs from edits onto it, after this process we get a new snapshot. from this point on namenode can accept files system requests from clients / DNs

Upvotes: 3

abhinav
abhinav

Reputation: 1282

Yes as far as I know fsimage does not contains any information about blocks. This information is stored by data nodes. Namenode gets this information when it starts up from datanodes.

Upvotes: 2

Related Questions