Reputation: 31

ClusterSingletonManager not failing over

I have been testing a Master / Worker cluster with the following set up:

2 virtual servers, each server has a Master and a Worker (separate jvms)
Master is instantiated with ClusterSingletonManager
The Masters are also the seed nodes.

I was testing the fail over of the Masters by manually shutting down the "Active" Master node. In the scenario where the Workers are not processing tasks the fail over works fine. The "Non-active" Master node does detect the other node as unreachable and eventually will start it's Master actor.

But if the workers are busy then fail over does not completely work. The "Non-active" Master node does detect the other as unreachable and quarantines as indicated in the below message but the node never starts the Master actor.

2014-07-23 23:52:31,777 INFO [JobRunner-akka.actor.default-dispatcher-17] Quarantined address [akka.tcp://[email protected]:40000] is still unreachable or has not been restarted. Keeping it quarantined.

Anybody have any ideas why this is happening and if there is solution to this?

Thanks. Regards.

Upvotes: 1

Answers (2)

user3784318

Reputation: 31

In the end putting the Master nodes on to their own servers (separate from the Workers) worked.

Upvotes: 1

Konrad 'ktoso' Malawski

Reputation: 13130

Which version of Akka are you using? There has been improvements in heartbeat priotization recently – please upgrade to 2.3.4 and check.

Upvotes: 0

ClusterSingletonManager not failing over

Answers (2)

Related Questions