Reputation: 959
Is there a way to diagnose HA issues in ActiveMQ Artemis? I have a pair of shared-store servers that work really well. When I shut down the primary, the secondary takes over until it primary tells it it's back up, then the primary takes over and the secondary goes back to being a secondary.
I took the configuration and basically copied it to another pair of servers, but this one isn't working.
Everything looks fine, as far as I can tell. The cluster appears in the console, and the two servers connect. When I shut down the primary, the secondary logs this message:
2020-12-06 16:59:26,379 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to <Primary IP>/<Primary IP>:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
In the working pair, right after this message the secondary speedily deploys all my addresses and queues and takes over. But the new pair, the secondary does nothing after this.
I'm not sure where to start looking. I just keep comparing the configuration of the non-working pair with the working pair.
I'm using an NFS mount. The type of shared file is Azure's NetApp.
Here are my broker configurations. This is correct though because it works on the other pair...
Primary:
<connectors>
<connector name="artemis">tcp://<primary URL>:61616</connector>
<connector name="artemis-backup">tcp://<secondary URL>:61616</connector>
</connectors>
<cluster-user>activemq</cluster-user>
<cluster-password>artemis123</cluster-password>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<cluster-connections>
<cluster-connection name="cluster-1">
<connector-ref>artemis</connector-ref>
<static-connectors>
<connector-ref>artemis-backup</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Secondary:
<connectors>
<connector name="artemis-live">tcp://<primary URL>:61616</connector>
<connector name="artemis">tcp://<secondary URL>:61616</connector>
</connectors>
<cluster-user>activemq</cluster-user>
<cluster-password>artemis123</cluster-password>
<ha-policy>
<shared-store>
<slave>
<allow-failback>true</allow-failback>
<failover-on-shutdown>true</failover-on-shutdown>
</slave>
</shared-store>
</ha-policy>
<cluster-connections>
<cluster-connection name="cluster-1">
<connector-ref>artemis</connector-ref>
<static-connectors>
<connector-ref>artemis-live</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Upvotes: 0
Views: 438
Reputation: 35008
In the shared-store configuration the backup broker continuously attempts to acquire a file lock on the journal. However, since the master broker already has the lock it won't be able to until the master dies. Therefore, I would look at the shared storage and ensure that file locking is working properly.
Since you're using NFS the NFS client configuration options are worth inspecting as well. Here are the configuration options I would recommend to enable reasonable fail-over times:
Upvotes: 1