bigdata123
bigdata123

Reputation: 463

secondary name node functionality

Can someone explain what exactly the words in bold mean which are taken from text book? What does "state of the secondary namenode lags that of the primary " mean?

Secondary name node keeps a copy of the merged namespace image, which can be used in the event of the namenode failing. **However, the state of the secondary namenode lags that of the primary, so in the event of total failure of the primary, data loss is almost certain.**The usual course of action in this case is to copy the namenode’s metadata files that are on NFS to the secondary and run it as the new primary.

Thanks in advance

Upvotes: 2

Views: 1797

Answers (1)

Sandeep Singh
Sandeep Singh

Reputation: 7990

Hadoop 1.x:

When we start ha hadoop cluster its creates a file system image which keeps the metadata information of your entire hadopp cluster. When a new entry comes into the hadoop cluster it goes to edits log. Secondary NameNode periodically reads and query the edits and retrieve the information and merge the information with fsimage. In case NameNode fails, hadoop administrator can start the hadoop cluster with the help of fsimage and edits.(during start NameNode reads the edits and fsimage so there wont be data loss)

Fsimage and edits log already keeps the updated information about file system in the form of metadata so in case of total failure of primary hadoop administrator can recover the cluster information with help of edits log and fsimage.

Hadoop 2.x:

In hadoop 1.x NameNode was a single point of failure. Failure of NameNode was downtime for your entire hadoop cluster. Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in periods of cluster downtime.To overcome this issue hadoop community added High Availability feature. During the setting up of hadoop cluster you can choose which type of cluster you want.

The HDFS NameNode High Availability feature enables you to run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.Both NameNode require the same type of hardware configuration.

In HA configuration one NameNode will be active and other will be in standby state.The ZKFailoverController (ZKFC) is a ZooKeeper client that monitors and manages the state of the NameNode. When active NameNode goes down, It makes standby as active NameNode, and primary NameNode will become standby when you start them. Please can get more on it on this website: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.8.0/bk_system-admin-guide/content/ch_hadoop-ha-5.html

In HA hadoop cluster Active NameNode reads and write metadata information in JournalNode(Quorum-based Storage only). JournalNode is a separate node in HA hadoop cluster used for reads and write edits log and fsimage.

Standby NameNodealways synchronized with active NameNode, both communicate with each other through Journal Node. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. Standby NameNode constantly monitors edit logs at journal nodes and updates its namespace accordingly.In the event of failover, standby NameNode will ensure that its namespace is completely updated according to edit logs before it is changes to active state. When standby will be in active state it will start writing edits log into JournalNode.

Hadoop don't keep any data into NameNode, All data resides in datanode, In case of NameNode failure there wont be any loss of data.

Upvotes: 2

Related Questions