Vignesh
Vignesh

Reputation: 1079

IBM Websphere MQ cluster channels ended abnormally and resuming frequently

In a cluster environment , I see channels to a particular server is ending abnormally and resuming frequently in a day.
Eg: QMGR A has several QMGRS(B,C,D,E,F) connected to it.(each in different server)
Cluster Receiver channels from QMGR B,C,D,E,F are ended abnormally on QMGR A and resuming quite frequently in a day.

QMGR A LOGS


    -------------------------------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1165) User(mqad) Program(amqrmppa)  
AMQ9209: Connection to host 'HOST.B (139.120.210.19)' closed.  

EXPLANATION:  
An error occurred receiving data from 'HOST.B (139.120.210.19)' over TCP/IP.  
 The connection to the remote host has unexpectedly terminated.  
ACTION:  
Tell the systems administrator.  
----- amqccita.c : 3094 -------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1165) User(mqad) Program(amqrmppa)  
AMQ9999: Channel program ended abnormally.  

EXPLANATION:  
Channel program 'CHANNEL.TO.B' ended abnormally.  
ACTION:  
Look at previous error messages for channel program 'CHANNEL.TO.B' in the  
error files to determine the cause of the failure.  
----- amqrccca.c : 777 --------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1175) User(mqad) Program(amqrmppa)  
AMQ9209: Connection to host 'HOST.C (155.10.186.20)' closed.  

EXPLANATION:  
An error occurred receiving data from 'HOST.C (155.10.186.20)' over TCP/IP.  
The connection to the remote host has unexpectedly terminated.  
ACTION:  
Tell the systems administrator.  
----- amqccita.c : 3094 -------------------------------------------------------  
08/04/12 08:44:41 - Process(1720412.1175) User(mqad) Program(amqrmppa)  
AMQ9999: Channel program ended abnormally.  

EXPLANATION:  
Channel program 'CHANNEL.TO.C' ended abnormally.  
ACTION:  
Look at previous error messages for channel program 'CHANNEL.TO.C' in the  
error files to determine the cause of the failure.  
    -------------------------------------------------------------------------------  

QMGR LOG on HOST B


08/04/2012 08:44:09 AM - Process(17174.16023) User(mqad) Program(amqrmppa)
AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.
----- amqccita.c : 3546 -------------------------------------------------------
08/04/2012 08:44:09 AM - Process(17174.16023) User(mqad) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'CHANNEL.TO.B' ended abnormally.
ACTION:
Look at previous error messages for channel program 'CHANNEL.TO.B' in the
error files to determine the cause of the failure.


QMGR LOG on HOST C

-------------------------------------------------------------------------------
08/04/12 08:44:35 - Process(462890.4658) User(mqad) Program(amqrmppa)
AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.
----- amqccita.c : 3341 -------------------------------------------------------
08/04/12 08:44:35 - Process(462890.4658) User(mqad) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'CHANNEL.TO.C' ended abnormally.
ACTION:
Look at previous error messages for channel program 'CHANNEL.TO.C' in the
error files to determine the cause of the failure.
----- amqrmrsa.c : 468 --------------------------------------------------------

I'm trying to understand what is causing this?? Is it caused if the Queue manager A is overloaded with as many connections ?? I don't see any TCP/IP error code logged on the qmgr log.

Upvotes: 1

Views: 9383

Answers (1)

Morag Hughson
Morag Hughson

Reputation: 7525

Looks like you are running pre V7.1 version of MQ? In MQ V7.1 that error message was updated from:-

AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.

to

AMQ9259: Connection timed out from host 'HOST.A'.

EXPLANATION:
A connection from host 'HOST.A' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 60 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself.

as an example. The most likely reason for the AMQ9259 error message is that your receive timeout settings have caused the channel to pop out of its receive and close the channel. Suggest you review the receive time out settings in your qm.ini file to see if they are set to something shorter than your heartbeat intervals.

The channels restart again automatically because you have retry intervals defined on them. This is good!

Upvotes: 4

Related Questions