Reputation: 11607
While reading ZooKeeper's documentation, it seems to me that HDFS relies on pretty much the same mechanisms of distribution/replication (broadly speeking) as ZooKeeper. I hear some echo from one to another, but I still can't distinguish things clearly and striclty.
I understand ZooKeeper is a Cluster Management / Sync tool, while HDFS is a Distributed File Management System, but could ZK be needed on an HDFS cluster for example?
Upvotes: 0
Views: 628
Reputation: 563
Yes, the factor is distributed processing and high availability on a hadoop cluster with a zookeper's quorum
For ex. Hadoop Namenode fail over process.
Hadoop high availability is designed around Active Namenode & Standby Namenode for fail over process. At any point of time, you should not have two masters ( active Namenodes) at same time.
Zookeper resolves cluster address to an active namenode.
Upvotes: 1