Reputation: 125
I am using Apache Curator Leader Election Recipe : https://curator.apache.org/curator-recipes/leader-election.html in my application.
Zookeeper version : 3.5.7 Curator : 4.0.1
Below are the sequence of steps: 1. Whenever my tomcat server instance is getting up, I create a single CuratorFramework instance(single instance per tomcat server) and start it :
CuratorFramework client = CuratorFrameworkFactory.newClient(connectionString, retryPolicy);
client.start();
if(!client.blockUntilConnected(10, TimeUnit.MINUTES)){
LOGGER.error("Zookeeper connection could not establish!");
throw new RuntimeException("Zookeeper connection could not establish");
}
LSAdapter adapter = new LSAdapter(client, <some_metadata>);
adapter.start();
Below is my LSAdapter class :
public class LSAdapter extends LeaderSelectorListenerAdapter implements Closeable {
//<Class instance variables defined>
public LSAdapter(CuratorFramework client, <some_metadata>) {
leaderSelector = new LeaderSelector(client, <path_to_be_used_for_leader_election>, this);
leaderSelector.autoRequeue();
}
public void start() throws IOException {
leaderSelector.start();
}
@Override
public void close() throws IOException {
leaderSelector.close();
}
@Override
public void takeLeadership(CuratorFramework client) throws Exception {
final int waitSeconds = (int) (5 * Math.random()) + 1;
LOGGER.info(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
LOGGER.debug(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
while (true) {
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
//do leader tasks
} catch (InterruptedException e) {
LOGGER.error(name + " was interrupted.");
//cleanup
Thread.currentThread().interrupt();
} finally {
}
}
}
}
CloseableUtils.closeQuietly(lsAdapter);
curatorFrameworkClient.close();
The issue I am facing is that at times, when server is restarted, no leader gets elected. I checked that by tracing the log inside takeLeadership(). I have two tomcat server instances with above code, connecting to same zookeeper quorum and most of the times one of the instance becomes leader but when this issue happens, both of them becomes follower. Please suggest what am I doing wrong.
Upvotes: 1
Views: 672
Reputation: 2956
As I answered on Curator's Jira, you are swallowing the interrupted exception. When you get InterruptedException you must exit your takeLeadership(). In your code example, you are merely resetting the interrupted state and continuing the loop - this will cause an infinite loop of interrupted exceptions, btw. After calling Thread.currentThread().interrupt(); you should exit the while loop.
Upvotes: 0