Torsten Römer
Torsten Römer

Reputation: 3926

WildFly - HornetQ connection timeout and performance issue

In a WildFly 8.2.1-Final high-volume installation we are experiencing degrading performance within about a week until restart is necessary along with more and more frequently occurring failures to send JMS messages because of transactions in state ABORT_ONLY:

Caused by: javax.ejb.EJBException: javax.jms.JMSException: Could not create a session: IJ000457: Unchecked throwable in managedConnectionReconnected()
Caused by: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3915e409[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@7d6bcd06 connection handles=0 lastUse=1494579251303 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.PoolByCri@691085ec mcp=SemaphoreArrayListManagedConnectionPool@66a127f5[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@56a9bb4[xaResource=org.hornetq.ra.HornetQRAXAResource@4ca920eb pad=false overrideRmValue=null productName=HornetQ productVersion=2.0 jndiName=java:/JmsXA] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:780)
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:518)
        at org.hornetq.ra.HornetQRASessionFactoryImpl.allocateConnection(HornetQRASessionFactoryImpl.java:948)
        ... 261 more
Caused by: javax.resource.ResourceException: IJ000461: Could not enlist in transaction on entering meta-aware object
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:551)
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:775)
        ... 263 more
Caused by: javax.transaction.SystemException: IJ000356: Failed to enlist: java.lang.Throwable: Unabled to enlist resource, see the previous warnings. tx=TransactionImple < ac, BasicAction: 0:ffff0a0001c3:682851c6:5900737b:dd09939 status: ActionStatus.ABORT_ONLY >
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener$TransactionSynchronization.checkEnlisted(TxConnectionListener.java:848)
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener.enlist(TxConnectionListener.java:383)
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:544)
        ... 264 more

The reason why the transaction is in state ABORT_ONLY is a warning previously logged:

2017-05-30 11:33:15,977 WARN  [com.arjuna.ats.jta] (Thread-123) ARJUNA016086: TransactionImple.enlistResource setTransactionTimeout on XAResource < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff0a0001c3:-7a698606:591fe1ff:82cea51, node_name=1, branch_uid=0:ffff0a0001c3:-7a698606:591fe1ff:830e516, subordinatenodename=null, eis_name=java:/JmsXA > threw: XAException.XAER_RMERR: javax.transaction.xa.XAException
Caused by: HornetQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 51]
       at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:390)
       at org.hornetq.core.client.impl.ClientSessionImpl.start(ClientSessionImpl.java:1901)

Some details:

The messages are sent from a singleton EJB while the relevant business method is annotated with @Lock(LockType.READ), so it can be called concurrently. The EJB injects the java:/JmsXA TopicConnectionFactory and the Topic and creates a new TopicConnection, Session and MessageProducer on each invocation of the method (each message being sent) and takes care to close all of them in a finally {} block.

Since the timeouts occur sending packets 51 (SESS_XA_START) and 63 (SESS_XA_SET_TIMEOUT) it seems starting the session is slow or hangs.

Any idea what the reason could be?

Upvotes: 3

Views: 1968

Answers (2)

Gab
Gab

Reputation: 8323

Experimented the same kind of issues with Ironjacamar 1.0.9 on a jboss as 7.1.1 Final with XA transaction on Mysql and JMS resource using JCA PooledConnectionFactory (and patched it the same way, see below)

Below is the flow of errors :

  • it start by a Mysql timeout (the JTA coordinator timeout is set to 100s)

    • ARJUNA012095: Abort of action id 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 invoked while multiple threads active within it.
    • TransactionReaper::check successfuly marked TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 as rollback only
    • JDBCExceptionReporter - Lock wait timeout exceeded; try restarting transaction
  • Then came a first XA error (on a different thread)

    • XA operation failed Cannot start, session is already doing work in a transaction XidImpl xxx
  • And finally the same error come back again and again each time on different thread more and more frequently, referencing the SAME transaction id

    • XA operation failed Cannot start, session is already doing work in a transaction XidImpl xxx

At the end 100% of JMS post operation fail on the same error until we reboot the server or the connection pool is flushed and refilled from scatch

Below the pach code (according to Torsten Römer answer)

protected void reconnectManagedConnection(ConnectionListener cl) throws ResourceException
   {
      try
      {
         managedConnectionReconnected(cl);
      }
      catch (Throwable t)
      {
         disconnectManagedConnection(cl);
         // begin patch
         try
         {
            log.warn("JCA patch : forcing destroy on disconnected managed connection ");
            cl.getManagedConnection().destroy();
         }
         catch (ResourceException e)
         {
            log.warn("JCA patch : ResourceException thrown while forcing connection destroy : " + e.getMessage());
         }
         // end patch
         throw new ResourceException(bundle.uncheckedThrowableInManagedConnectionReconnected(cl), t);
      }
   }

And below a raw log (truncated) with the error flow

2018-03-20 05:08:04,408 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 in state  RUN
2018-03-20 05:08:04,411 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012095: Abort of action id 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 invoked while multiple threads active within it.
2018-03-20 05:08:04,411 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012108: CheckedAction::check - atomic action 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 aborting with 1 threads active!
2018-03-20 05:08:04,911 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 in state  CANCEL
2018-03-20 05:08:05,413 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 in state  CANCEL_INTERRUPTED
2018-03-20 05:08:05,414 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012120: TransactionReaper::check worker Thread[Transaction Reaper Worker 0,5,main] not responding to interrupt when cancelling TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 -- worker marked as zombie and TX scheduled for mark-as-rollback
2018-03-20 05:08:05,414 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012110: TransactionReaper::check successfuly marked TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 as rollback only

[...]

2018-03-20 05:08:26,882 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-23] ERROR o.h.util.JDBCExceptionReporter - Lock wait timeout exceeded; try restarting transaction
2018-03-20 05:08:26,886 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-23] ERROR o.h.e.d.AbstractFlushingEventListener - Could not synchronize database state with session
org.hibernate.exception.GenericJDBCException: Could not execute JDBC batch update
        at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:126) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
        [...]
        Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1084) ~[mysql-connector-java-5.1.30.jar!/:na]
        [...]

[...]   

2018-03-20 05:08:32,886 [] [Transaction Reaper Worker 0] ERROR o.h.c.client.impl.ClientSessionImpl - Caught Exception
org.hornetq.api.core.HornetQException: Timed out waiting for response when sending packet 52
        at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:302) ~[hornetq-core-2.2.13.Final.jar!/:na]
        [...]
2018-03-20 05:08:32,887 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012091: Top-level abort of action 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 received TwoPhaseOutcome.FINISH_ERROR from com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord
2018-03-20 05:08:32,887 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-23] WARN com.arjuna.ats.arjuna - ARJUNA012077: Abort called on already aborted atomic action 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5
2018-03-20 05:08:32,889 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012113: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] missed interrupt when cancelling TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 -- exiting as zombie (zombie count decremented to 0)

[...]

2018-03-20 05:08:36,407 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-1] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:08:36,460 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-1] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518916451 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        [...]/
        Caused by: javax.resource.ResourceException: IJ000461: Could not enlist in transaction on entering meta-aware object
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:474) ~[na:na]
        [...]
        Caused by: javax.transaction.SystemException: IJ000356: Failed to enlist: java.lang.Throwable: Unabled to enlist resource, see the previous warnings. tx=TransactionImple < ac, BasicAction: 0:ffff0af02d68:-63fd0719:5ab027d7:5b978 status: ActionStatus.ABORT_ONLY >
        [...]
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener$TransactionSynchronization.checkEnlisted(TxConnectionListener.java:552) ~[na:na]
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener.enlist(TxConnectionListener.java:282) ~[na:na]
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:467) ~[na:na]
        ... 164 common frames omitted
2018-03-20 05:08:44,602 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-25] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:08:44,663 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-25] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518924643 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]
[...]
2018-03-20 05:09:08,906 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-6] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:09:08,957 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-6] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518948948 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]
[...]
2018-03-20 05:09:34,353 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-26] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:09:34,474 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-26] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518974452 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]
[...]
2018-03-20 05:14:49,036 [tenant=bloxxx] [ajp--0_0_0_0_0_0_0_0-8009-14] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:14:49,070 [tenant=bloxxx] [ajp--0_0_0_0_0_0_0_0-8009-14] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521519289064 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]

Upvotes: 1

Torsten R&#246;mer
Torsten R&#246;mer

Reputation: 3926

Since we could see that in thousands of these errors just a few distinct connections were involved, we got the suspicion that connections somehow break and stay in the pool, so these broken connections are reused, causing consecutive errors.

Post #5 in this similar issue says:

The connections from the JmsXA connection factory do indeed come from a pool. However, each of those connections has a listener attached to it so that when a connection-level error occurs it should be kicked out of the pool automatically by the JCA implementation. The strangest thing about your issue is that everything is local so there shouldn't be any broken connections or any connectivity problems of any sort.

To make sure these broken connections are removed from the pool we inserted a call to cl.getManagedConnection().destroy() in line 799 in AbstractConnectionManager#reconnectManagedConnection(ConnectionListener) from ironjacamar-core-impl 1.1.9, replaced the JAR and let the system run for some weeks now.

It is running stable so far, does not slow down or even hang.

Some logging added to the same method logs the pool statistics and we can see that during this time 100 connections were destroyed but more important, in each of the initial error an individual connection is involved, so these broken connections are not reused any more.

The initial error causing the connection to break in the first place is still unclear. From the logging it seems that the same JMS connection is used by two threads at the same time, messing up the communication causing the timeouts and eventually leaving the connection in a broken state.

Another guess is that this is an issue in Ironjacamar 1.1.9 and was fixed long time ago, and does not occur in later releases of WildFly.

Upvotes: 1

Related Questions