Reputation: 369
As I understand, Apache Spark Master can be run in high availability mode using Zookeeper. That is, multiple Spark masters can run in Leader/Follower mode and these modes are registered with ZooKeeper.
In our scenario ZooKeeper is expiring the Spark Master's session which is acting as Leader. So the Spark Master which is leader receives this notification and shuts down deliberately.
Can someone explain why this decision of shutting down rather than retrying has been taken?
And why does Kafka retry connecting to Zookeeper when it receives the same Expiry notification?
Upvotes: 1
Views: 351
Reputation: 5957
Looks like you're hitting issue SPARK-15544 - Bouncing Zookeeper node causes Active spark master to exit.
Shutting Down a single zookeeper node caused spark master to exit. The master should have connected to a second zookeeper node.
As of March 2019, they're looking into a fix. You can follow the JIRA if you want to see when it's resolved.
Upvotes: 2