Michael Strauß
Michael Strauß

Reputation: 71

activemq master not giving up on network failure

I have an activemq installation with master / slave failover. Master and Slave are synced using the lease-database-locker Master and Slave run on 2 different machines and the database is located on a third machine.

Failover and client reconnection works properly on a forced shutdown of the master broker. The slave is taking over properly and the clients reconnect due to their failover setting.

The problems start, if I simulate a network outage on the master broker only. This is done by using an iptables Drop Rule for packages going to the database on the master.

The master now realizes, that it cannot connect to the Database any longer. The slave starts up, since it's network connection is still alive. It seems from the logs, that the clients still try to reconnect to the non responding master

For my understanding the master should inform the clients, that there is no connection anymore. The clients should failover and reconnect to the slave. But this is not happening.

The clients do reconnect to the slave if I reestablish the db connection by reenabling the network connection to the db for the master. The master gives up beeing the master then.

Is there a way to force the master to inform the clients to failover in this particular case ?

Upvotes: 3

Views: 1624

Answers (2)

Michael Strauß
Michael Strauß

Reputation: 71

After some digging I found the trick. The broker was not informing the clients due to a missing ioExceptionHandler configuration.

The documentation can be found here http://activemq.apache.org/configurable-ioexception-handling.html

I needed to specify <bean id="ioExceptionHandler" class="org.apache.activemq.util.LeaseLockerIOExceptionHandler"> <property name="stopStartConnectors"><value>true</value></property> <property name="resumeCheckSleepPeriod"><value>5000</value></property> </bean>

and tell the broker to use the Handler

<broker xmlns="http://activemq.apache.org/schema/core" ....
        ioExceptionHandler="#ioExceptionHandler" >

In order to produce an error on network outages I also had to set a queryTimeout on the lease query:

 <jdbcPersistenceAdapter dataDirectory="${activemq.base}/data" dataSource="#mysql-ds-db01-st" lockKeepAlivePeriod="3000">
      <locker>
           <lease-database-locker lockAcquireSleepInterval="10000" queryTimeout="8" />
      </locker>

This will produce an sql exception if the query takes to long due to a network outage.

I did test the network by dropping packages to the database using an iptables rule:
/sbin/iptables -A OUTPUT -p tcp --destination-port 13306 -j DROP

Upvotes: 4

Tim Bish
Tim Bish

Reputation: 18356

Sounds like you client doesn't have the address of the slave in its URI so it doesn't know where to reconnect to. The master broker doesn't inform the client where the slave is as it doesn't know there is a slave(s) or where that slave might be on the network, and even if it did that would be unreliable depending on what the conditions are that caused the master broker to drop in the first place.

You need to provide the client with the connection information for the master and the slave in the failover URI.

Upvotes: 0

Related Questions