Reputation: 550
I understand that in master node we have name node which maintains a metadata in two files. One is FSImage and the other is Edit logs.
So this FSImage is initially loaded when the hadoop system is started and this FSImage contains the directory structure of the clusters and data stored. Then afterwards, for every transaction occurring, edit logs file is updated.
My questions are the following:
Upvotes: 1
Views: 4726
Reputation: 1
Yes, these are the only two files that contains cluster file system information
No. On every restart of Name node FSImage will be written to the disk and on every checkpoin SNN will write the FSImage to the Disk
On busy cluster the EditLog will grow very fast. If the edit log is very big then the next restart of NN will take longer time. The SNN will merge EDITlog and FSImage periodically. SNN will also serve as backup for FSImage in case of your NN disk failure.
Yes. FSImage will be updates in main memory not in disk. At the time EDITlog will update on disk with the new transaction
Upvotes: 0
Reputation: 2255
To understand this, we have to go through it in detail, step by step, when Hadoop is running
The Namenode after loading the FSImage has the whole snapshot of where data is stored in memory.
Transactions are coming in, the informations is stored in the edit log.
Periodically, per default every hour, the checkpoint node/secondary namenode, retrieves the logs, and merges them with the latest fsimage and keeps the data as a checkpoint. At this point, the nn has the image in the memory, the edit logs are emptied and the latest checkpoint is stored as an image on the snn/cn.
To answer your question.
Yes, there are only two files
The fsimage on the SNN/CN will be updated regularly. The fsimage on the NN will be updated, when a checkpoint gets imported. This should happen at least with a reboot.
The merging of editlog to fsimage is a costly operation. It would require in a namenode to go in a safemode in order to merge the data. This is not possible in such an environment
deleting is a log as well as a write is, so it gets stored in the edit log
Upvotes: 2
Reputation: 1133
1) Yes only these two files are there .
2) This is true for name node .
3) It is copied to secondary name node for persistent storage , things would work fine un till name node is up ,lets say you have done so many changes like creating directories ,files ,putting the data to hdfs and so on so during run time this information is directly loaded into the memory but what if namenode goes down so what ever new meta information was there which is not embedded current fsimage ,it would get lost permanently because when ever your system would come up it would load the fsimage into memory since its the old fsimage it won't have new changes . With this secondary name node we are preserving this changes in edit.log and finally edit.log file used for fsimage and new fsimage can be replaced with old one .
4) process is when ever meta data gets changes ,this event gets written in edit.log file and after some specified interval these logs copied to secondary name node when their size gets too big then edit.log information is flushed into the form of fsimage.
current fsimage would not get updated with addition or deletion of file ,these changes will directly cater in memory.
Upvotes: 0