Loui
Loui

Reputation: 550

Hadoop - HDFS Namenode metadata - FSImage

I understand that in master node we have name node which maintains a metadata in two files. One is FSImage and the other is Edit logs.
So this FSImage is initially loaded when the hadoop system is started and this FSImage contains the directory structure of the clusters and data stored. Then afterwards, for every transaction occurring, edit logs file is updated.

My questions are the following:

  1. Are these only the files that contain all information(FSImage and EditLogs) or are there more?
  2. Does this mean that FSImage file will only be written once?
  3. If yes, then why is it always copied to Secondary name node? Isn't it increasing a task to be completed?
  4. Suppose I added or deleted a new file in hdfs; then wouldn't this FSImage be updated?

Upvotes: 1

Views: 4726

Answers (3)

UmaSankar
UmaSankar

Reputation: 1

  1. Yes, these are the only two files that contains cluster file system information

  2. No. On every restart of Name node FSImage will be written to the disk and on every checkpoin SNN will write the FSImage to the Disk

  3. On busy cluster the EditLog will grow very fast. If the edit log is very big then the next restart of NN will take longer time. The SNN will merge EDITlog and FSImage periodically. SNN will also serve as backup for FSImage in case of your NN disk failure.

  4. Yes. FSImage will be updates in main memory not in disk. At the time EDITlog will update on disk with the new transaction

Upvotes: 0

Stefan Papp
Stefan Papp

Reputation: 2255

To understand this, we have to go through it in detail, step by step, when Hadoop is running

  1. The Namenode after loading the FSImage has the whole snapshot of where data is stored in memory.

  2. Transactions are coming in, the informations is stored in the edit log.

  3. Periodically, per default every hour, the checkpoint node/secondary namenode, retrieves the logs, and merges them with the latest fsimage and keeps the data as a checkpoint. At this point, the nn has the image in the memory, the edit logs are emptied and the latest checkpoint is stored as an image on the snn/cn.

To answer your question.

  1. Yes, there are only two files

  2. The fsimage on the SNN/CN will be updated regularly. The fsimage on the NN will be updated, when a checkpoint gets imported. This should happen at least with a reboot.

  3. The merging of editlog to fsimage is a costly operation. It would require in a namenode to go in a safemode in order to merge the data. This is not possible in such an environment

  4. deleting is a log as well as a write is, so it gets stored in the edit log

Upvotes: 2

user3484461
user3484461

Reputation: 1133

    1) Yes only these two files are there .

    2) This is true for name node .

    3) It is copied to secondary name node for persistent storage , things would work fine un till name node is up ,lets say you have done so many changes like creating directories ,files ,putting the data to hdfs and so on so during run time this information is directly loaded into the memory but what if  namenode goes down so what ever new meta information was there which is not embedded  current fsimage  ,it would get lost permanently because when ever your system would come up it would load the fsimage into memory since its the old fsimage it won't have new changes . With this secondary name node we are preserving this changes in edit.log and finally edit.log file used for fsimage and new fsimage can be replaced with old one .
    4) process is when ever meta data gets changes ,this event gets written in edit.log file and after some specified interval these logs copied to secondary name node when their size gets too big then edit.log information is flushed into the form of fsimage. 
 current fsimage  would not get updated with addition or deletion of file ,these changes will directly cater in memory.

Upvotes: 0

Related Questions