Reputation: 6686
In hadoop
ecosystem we have NameNode
and SecondaryNameNode
, NameNode
is responsible to managing all data available in cluster, So my question is when NameNode
goes down how ecosystem replace and recover it with other NameNode
?
Upvotes: 4
Views: 5417
Reputation: 34184
There are 2 things to be considered here,
1- Recovery through SecondaryNameNode
2- Recovery through redundant NameNode
In hadoop-1.x
we have the concept on SecondaryNameNode which holds a copy of the NameNode metadata. If your NameNode goes down you can take the metadata copy stored with SecondaryNameNode and use it to resume your work, once your NameNode is up again.
With hadoop-2.x(HA)
you can have more than 1 NameNode. In case primary NameNode goes down, the redundant NameNode can take over so that your cluster doesn't stop working(either manual or automatic). In this implementation there is a pair of NameNodes in an active/standby configuration. In the event of the failure of the active namenode, the standby
takes over its duties to continue servicing client requests.
In order to take advantage of HA feature, you should run the NameNodes in HA mode with a quorum of journalling nodes, or a shared HA-NFS storage for the edit log transaction files. I would suggest you to go through these posts which explain the recovery mechanisms beautifully :
2- http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1/
3- http://blog.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
But if you are on hadoop-1.x, you are better off having 2 separate locations for storing NameNode metadata(one drive on the machine itself+1 NAS).
HTH
Upvotes: 5