james007
james007

Reputation: 741

Zooker Failover Strategies

We are young team building an applicaiton using Storm and Kafka. We have common Zookeeper ensemble of 3 nodes which is used by both Storm and Kafka.

I wrote a test case to test zooker Failovers

1) Check all the three nodes are running and confirm one is elected as a Leader.

2) Using Zookeeper unix client, created a znode and set a value. Verify the values are reflected on other nodes.

3) Modify the znode. set value in one node and verify other nodes have the change reflected.

4) Kill one of the worker nodes and make sure the master/leader is notified about the crash.

5) Kill the leader node. Verify out of other two nodes, one is elected as a leader.

Do i need i add any more test case? additional ideas/suggestion/pointers to add?

Upvotes: 0

Views: 2664

Answers (1)

user2720864
user2720864

Reputation: 8161

From the documentation
Verifying automatic failover

Once automatic failover has been set up, you should test its operation. To do so, first locate the active NameNode. You can tell which node is active by visiting the NameNode web interfaces -- each node reports its HA state at the top of the page.

Once you have located your active NameNode, you may cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or, you could power cycle the machine or unplug its network interface to simulate a different kind of outage. After triggering the outage you wish to test, the other NameNode should automatically become active within several seconds. The amount of time required to detect a failure and trigger a fail-over depends on the configuration of ha.zookeeper.session-timeout.ms, but defaults to 5 seconds.

If the test does not succeed, you may have a misconfiguration. Check the logs for the zkfc daemons as well as the NameNode daemons in order to further diagnose the issue.

more on setting up automatic failover

Upvotes: 1

Related Questions