Newerth
Newerth

Reputation: 574

Artemis slave node does not go live start in full cluster

I'm trying to run a 6 node Apache Artemis static cluster as 3 live-backup pairs. In order not to lose data during full cluster restart I've also created a mechanism figuring out which of the pair has newer data, and based on that the server automatically decides whether it's safe to start that particular node. During full cluster restart the servers are started simultaneously, and nodes with newer data start before nodes that were lately acting as backups.

This scenario has been working flawlessly with single live-backup pair.

Shutdown:

  1. Master node is live, slave node is backup
  2. Master node => slave node goes live
  3. Slave node

Start:

  1. Both nodes are dead
  2. Slave node => goes live
  3. Master node => goes live, slave node goes backup

However in the 6 node cluster the backup node does not go live at all:

  1. 3 live nodes, 3 backup nodes; pairs bound together in separate groups (rs1, rs2, rs3)
  2. Stop rs1 master node => rs1 slave node goes live
  3. Stop rs1 slave node
  4. Start rs1 slave node => never goes live
  5. Must not start rs1 master node because of potential data loss

Any idea what's wrong with my setup?

Master HA configuration (the group name varies):

<?xml version="1.0" encoding="UTF-8"?>
<ha-policy xmlns="urn:activemq:core">
   <replication>
      <master>
         <cluster-name>clouedi-mq-dev</cluster-name>
         <group-name>rs1</group-name>
         <check-for-live-server>true</check-for-live-server>
      </master>
   </replication>
</ha-policy>

Slave HA configuration (the group name varies):

<?xml version="1.0" encoding="UTF-8"?>
<ha-policy xmlns="urn:activemq:core">
   <replication>
      <slave>
         <cluster-name>clouedi-mq-dev</cluster-name>
         <group-name>rs1</group-name>
         <allow-failback>true</allow-failback>
      </slave>
   </replication>
</ha-policy>

Thank you.

Before the issue

Master

2020-11-30 12:34:54,021 INFO  [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2020-11-30 12:34:54,082 INFO  [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2020-11-30 12:34:59,425 INFO  [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2020-11-30 12:34:59,429 WARN  [org.apache.activemq.artemis.core.server] AMQ222007: Security risk! Apache ActiveMQ Artemis is running with the default cluster admin user and default password. Please see the cluster chapter in the ActiveMQ Artemis User Guide for instructions on how to change this.
2020-11-30 12:34:59,435 INFO  [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 1,073,741,824
2020-11-30 12:34:59,487 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
...address deployment...
2020-11-30 12:35:00,123 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61616 for protocols [CORE,STOMP]
2020-11-30 12:35:00,124 INFO  [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2020-11-30 12:35:00,124 INFO  [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.15.0 [edimq-broker-master-dev-az1-1, nodeID=6937ad47-309c-11eb-811e-0275ba22b614]
2020-11-30 12:35:00,315 INFO  [org.apache.activemq.artemis.core.server] AMQ221027: Bridge ClusterConnectionBridge@58d025c9 [name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@7c398410 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@58d025c9 [name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@7c398410 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-206], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1779219567[nodeUUID=6937ad47-309c-11eb-811e-0275ba22b614, connector=TransportConfiguration(name=edimq-broker-master-dev-az1-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-96, address=, server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614])) [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-206], discoveryGroupConfiguration=null]] is connected
2020-11-30 12:35:00,318 INFO  [org.apache.activemq.artemis.core.server] AMQ221027: Bridge ClusterConnectionBridge@77158457 [name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@2f988822 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@77158457 [name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@2f988822 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-3, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-20], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1779219567[nodeUUID=6937ad47-309c-11eb-811e-0275ba22b614, connector=TransportConfiguration(name=edimq-broker-master-dev-az1-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-96, address=, server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614])) [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-3, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-20], discoveryGroupConfiguration=null]] is connected
2020-11-30 12:35:00,688 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2020-11-30 12:35:00,766 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2020-11-30 12:35:01,187 INFO  [io.hawt.HawtioContextListener] Initialising hawtio services
2020-11-30 12:35:01,196 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-11-30 12:35:01,199 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-11-30 12:35:01,203 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /opt/artemis/edimq-broker-master-dev-az1-1/tmp/uploads
2020-11-30 12:35:01,217 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-11-30 12:35:01,240 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/artemis/edimq-broker-master-dev-az1-1/etc//jolokia-access.xml]
2020-11-30 12:35:01,269 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-11-30 12:35:01,389 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, 10.35.2.96, edimq-broker-master-dev-az1-1.dc01.clouedi.local]
2020-11-30 12:35:01,722 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-11-30 12:35:01,722 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-11-30 12:35:01,723 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-11-30 12:35:18,072 INFO  [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending AIOSequentialFile:/opt/artemis/edimq-broker-master-dev-az1-1/data/journal/activemq-data-45.amq (size=10,485,760) to replica.
2020-11-30 12:35:18,687 INFO  [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /opt/artemis/edimq-broker-master-dev-az1-1/data/bindings/activemq-bindings-2.bindings (size=1,048,576) to replica.
2020-11-30 12:35:18,697 INFO  [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /opt/artemis/edimq-broker-master-dev-az1-1/data/bindings/activemq-bindings-50.bindings (size=1,048,576) to replica.
2020-11-30 12:35:18,709 INFO  [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /opt/artemis/edimq-broker-master-dev-az1-1/data/bindings/activemq-bindings-37.bindings (size=1,048,576) to replica.

Slave

2020-11-30 12:35:16,773 INFO  [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2020-11-30 12:35:16,847 INFO  [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2020-11-30 12:35:16,976 INFO  [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/artemis/edimq-broker-slave-dev-az1-1/data/journal/oldreplica.44
2020-11-30 12:35:16,978 INFO  [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/artemis/edimq-broker-slave-dev-az1-1/data/journal to /opt/artemis/edimq-broker-slave-dev-az1-1/data/journal/oldreplica.46
2020-11-30 12:35:17,039 INFO  [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2020-11-30 12:35:17,101 WARN  [org.apache.activemq.artemis.core.server] AMQ222007: Security risk! Apache ActiveMQ Artemis is running with the default cluster admin user and default password. Please see the cluster chapter in the ActiveMQ Artemis User Guide for instructions on how to change this.
2020-11-30 12:35:17,107 INFO  [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 1,073,741,824
2020-11-30 12:35:17,263 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2020-11-30 12:35:17,356 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2020-11-30 12:35:17,405 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2020-11-30 12:35:17,863 INFO  [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.15.0 [null] started, waiting live to fail before it gets active
2020-11-30 12:35:17,974 INFO  [io.hawt.HawtioContextListener] Initialising hawtio services
2020-11-30 12:35:17,985 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-11-30 12:35:17,987 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-11-30 12:35:17,992 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /opt/artemis/edimq-broker-slave-dev-az1-1/tmp/uploads
2020-11-30 12:35:18,021 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-11-30 12:35:18,058 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/artemis/edimq-broker-slave-dev-az1-1/etc//jolokia-access.xml]
2020-11-30 12:35:18,089 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-11-30 12:35:18,216 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, 10.35.2.101, edimq-broker-slave-dev-az1-1.dc01.clouedi.local]
2020-11-30 12:35:18,581 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-11-30 12:35:18,582 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-11-30 12:35:18,582 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-11-30 12:35:18,964 INFO  [org.apache.activemq.artemis.core.server] AMQ221024: Backup server ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614 is synchronized with live-server.
2020-11-30 12:35:18,981 INFO  [org.apache.activemq.artemis.core.server] AMQ221031: backup announced

Shutting down

Master

2020-11-30 12:38:48,621 WARN  [org.apache.activemq.artemis.core.server] AMQ222294:
**************************************************************************************************************************************************************************************************************************************************************
There is a possible split brain on nodeID 6937ad47-309c-11eb-811e-0275ba22b614, coming from connectors 6937ad47-309c-11eb-811e-0275ba22b614. Topology update ignored.
**************************************************************************************************************************************************************************************************************************************************************
2020-11-30 12:38:48,632 WARN  [org.apache.activemq.artemis.core.server] AMQ222294:
**************************************************************************************************************************************************************************************************************************************************************
There is a possible split brain on nodeID 6937ad47-309c-11eb-811e-0275ba22b614, coming from connectors 6937ad47-309c-11eb-811e-0275ba22b614. Topology update ignored.
**************************************************************************************************************************************************************************************************************************************************************
2020-11-30 12:38:48,644 INFO  [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe
2020-11-30 12:38:48,644 INFO  [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4
2020-11-30 12:38:48,796 INFO  [io.hawt.HawtioContextListener] Destroying hawtio services
2020-11-30 12:38:48,800 INFO  [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter
2020-11-30 12:38:48,849 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2020-11-30 12:38:48,853 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
2020-11-30 12:38:48,877 INFO  [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.15.0 [6937ad47-309c-11eb-811e-0275ba22b614] stopped, uptime 3 minutes

Slave goes live

2020-11-30 12:38:48,621 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
2020-11-30 12:38:48,622 INFO  [org.apache.activemq.artemis.core.server] AMQ221084: Requested 2 quorum votes
2020-11-30 12:38:48,623 INFO  [org.apache.activemq.artemis.core.server] AMQ221067: Waiting 30 seconds for quorum vote results.
2020-11-30 12:38:48,637 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 10.35.2.96/10.35.2.96:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2020-11-30 12:38:48,644 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 10.35.2.96/10.35.2.96:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2020-11-30 12:38:48,666 INFO  [org.apache.activemq.artemis.core.server] AMQ221060: Sending quorum vote request to 10.35.2.20/10.35.2.20:61616: ServerConnectVote [nodeId=6937ad47-309c-11eb-811e-0275ba22b614, vote=false]
2020-11-30 12:38:48,667 INFO  [org.apache.activemq.artemis.core.server] AMQ221060: Sending quorum vote request to 10.35.2.206/10.35.2.206:61616: ServerConnectVote [nodeId=6937ad47-309c-11eb-811e-0275ba22b614, vote=false]
2020-11-30 12:38:48,675 INFO  [org.apache.activemq.artemis.core.server] AMQ221061: Received quorum vote response from 10.35.2.206/10.35.2.206:61616: ServerConnectVote [nodeId=6937ad47-309c-11eb-811e-0275ba22b614, vote=true]
2020-11-30 12:38:48,676 INFO  [org.apache.activemq.artemis.core.server] AMQ221061: Received quorum vote response from 10.35.2.20/10.35.2.20:61616: ServerConnectVote [nodeId=6937ad47-309c-11eb-811e-0275ba22b614, vote=true]
2020-11-30 12:38:48,677 INFO  [org.apache.activemq.artemis.core.server] AMQ221068: Received all quorum votes.
2020-11-30 12:38:48,682 INFO  [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
2020-11-30 12:38:48,705 INFO  [org.apache.activemq.artemis.core.server] AMQ221037: ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614 to become 'live'
2020-11-30 12:38:48,722 WARN  [org.apache.activemq.artemis.core.client] AMQ212004: Failed to connect to server.
...address deployment...
2020-11-30 12:38:49,217 INFO  [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2020-11-30 12:38:49,238 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61616 for protocols [CORE,STOMP]
2020-11-30 12:38:49,269 INFO  [org.apache.activemq.artemis.core.server] AMQ221027: Bridge ClusterConnectionBridge@50794972 [name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@13fa6cc targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@50794972 [name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@13fa6cc targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-206], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1854054128[nodeUUID=6937ad47-309c-11eb-811e-0275ba22b614, connector=TransportConfiguration(name=edimq-broker-slave-dev-az1-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-101, address=, server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614])) [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-206], discoveryGroupConfiguration=null]] is connected
2020-11-30 12:38:49,270 INFO  [org.apache.activemq.artemis.core.server] AMQ221027: Bridge ClusterConnectionBridge@39f7703c [name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@66eabf5 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@39f7703c [name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, queue=QueueImpl[name=$.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614], temp=false]@66eabf5 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-3, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-20], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1854054128[nodeUUID=6937ad47-309c-11eb-811e-0275ba22b614, connector=TransportConfiguration(name=edimq-broker-slave-dev-az1-1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-101, address=, server=ActiveMQServerImpl::serverUUID=6937ad47-309c-11eb-811e-0275ba22b614])) [initialConnectors=[TransportConfiguration(name=edimq-broker-master-dev-az1-3, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=10-35-2-20], discoveryGroupConfiguration=null]] is connected

Shutting down slave

2020-11-30 12:39:56,693 WARN  [org.apache.activemq.artemis.core.server] AMQ222294:
**************************************************************************************************************************************************************************************************************************************************************
There is a possible split brain on nodeID 6937ad47-309c-11eb-811e-0275ba22b614, coming from connectors 6937ad47-309c-11eb-811e-0275ba22b614. Topology update ignored.
**************************************************************************************************************************************************************************************************************************************************************
2020-11-30 12:39:56,694 WARN  [org.apache.activemq.artemis.core.server] AMQ222294:
**************************************************************************************************************************************************************************************************************************************************************
There is a possible split brain on nodeID 6937ad47-309c-11eb-811e-0275ba22b614, coming from connectors 6937ad47-309c-11eb-811e-0275ba22b614. Topology update ignored.
**************************************************************************************************************************************************************************************************************************************************************
2020-11-30 12:39:56,704 INFO  [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.clouedi-mq-dev.88c93073-309c-11eb-9fb4-02ca1af23dbe
2020-11-30 12:39:56,711 INFO  [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.clouedi-mq-dev.793500b2-309c-11eb-998e-02436e93e5b4
2020-11-30 12:39:56,859 INFO  [io.hawt.HawtioContextListener] Destroying hawtio services
2020-11-30 12:39:56,863 INFO  [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter
2020-11-30 12:39:56,914 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2020-11-30 12:39:56,918 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
2020-11-30 12:39:56,946 INFO  [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.15.0 [6937ad47-309c-11eb-811e-0275ba22b614] stopped, uptime 4 minutes

Starting up

Slave

2020-11-30 12:40:34,153 INFO  [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2020-11-30 12:40:34,209 INFO  [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2020-11-30 12:40:34,240 INFO  [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/artemis/edimq-broker-slave-dev-az1-1/data/bindings/oldreplica.19
2020-11-30 12:40:34,246 INFO  [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/artemis/edimq-broker-slave-dev-az1-1/data/bindings to /opt/artemis/edimq-broker-slave-dev-az1-1/data/bindings/oldreplica.21
2020-11-30 12:40:34,248 INFO  [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/artemis/edimq-broker-slave-dev-az1-1/data/journal/oldreplica.45
2020-11-30 12:40:34,248 INFO  [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/artemis/edimq-broker-slave-dev-az1-1/data/journal to /opt/artemis/edimq-broker-slave-dev-az1-1/data/journal/oldreplica.47
2020-11-30 12:40:34,249 INFO  [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/artemis/edimq-broker-slave-dev-az1-1/data/paging/oldreplica.19
2020-11-30 12:40:34,257 INFO  [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/artemis/edimq-broker-slave-dev-az1-1/data/paging to /opt/artemis/edimq-broker-slave-dev-az1-1/data/paging/oldreplica.21
2020-11-30 12:40:34,321 INFO  [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2020-11-30 12:40:34,402 WARN  [org.apache.activemq.artemis.core.server] AMQ222007: Security risk! Apache ActiveMQ Artemis is running with the default cluster admin user and default password. Please see the cluster chapter in the ActiveMQ Artemis User Guide for instructions on how to change this.
2020-11-30 12:40:34,413 INFO  [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 1,073,741,824
2020-11-30 12:40:34,607 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2020-11-30 12:40:34,774 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2020-11-30 12:40:34,888 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2020-11-30 12:40:34,927 INFO  [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.15.0 [null] started, waiting live to fail before it gets active
2020-11-30 12:40:35,301 INFO  [io.hawt.HawtioContextListener] Initialising hawtio services
2020-11-30 12:40:35,310 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-11-30 12:40:35,312 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-11-30 12:40:35,320 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /opt/artemis/edimq-broker-slave-dev-az1-1/tmp/uploads
2020-11-30 12:40:35,339 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-11-30 12:40:35,365 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/artemis/edimq-broker-slave-dev-az1-1/etc//jolokia-access.xml]
2020-11-30 12:40:35,392 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-11-30 12:40:35,511 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, 10.35.2.101, edimq-broker-slave-dev-az1-1.dc01.clouedi.local]
2020-11-30 12:40:35,819 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-11-30 12:40:35,819 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-11-30 12:40:35,819 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console

The above shows that the slave does not start at all.

Upvotes: 4

Views: 1121

Answers (1)

When ActiveMQ Artemis uses replication, the live and the backup servers do not share the same data directories, all data synchronization is done over the network.

Upon start-up the backup server will first need to synchronize all existing data from the live server before becoming capable of replacing the live server should it fail. So unlike when using shared storage, a replicating backup will not be a fully operational backup right after start-up, but only after it finishes synchronizing the data with its live server.

Why does a single live-backup pair look working?

The backup node at start-up executes the following steps at SharedNothingBackupActivation:

  • Initialize (it looks running)
  • Waiting on cluster connection (it is starving here)
  • Starting backup manager

So in a scenario with a single live-backup pair the backup node doesn't complete the initialization.

Upvotes: 1

Related Questions