Disk full made MQ dead

Question

We have an application that uses WebSphere MQ 7.0.1.3. During extensive testing in our stage environment, the disks became full.

After this, the MQ is hanging. We removed the application logs (not related to MQ) and added more disk but it didn't solve the problem.

We tried to restart the queue manager:

$ endmqlsr
$ endmqm XYZ
$ strmqm XYZ
WebSphere MQ queue manager 'XYZ' starting.
WebSphere MQ was unable to display an error message 893.

The logs from the time when the disk became full and the error occurred:

----- amqxfdcx.c : 828 --------------------------------------------------------
06/08/2018 03:36:44 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6119: An internal WebSphere MQ error has occurred (Rc=28 from write)
----- amqxfdcx.c : 783 --------------------------------------------------------
06/08/2018 03:36:44 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager XYZ.
----- amqxfdcx.c : 822 --------------------------------------------------------
06/08/2018 03:36:46 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6119: An internal WebSphere MQ error has occurred (Rc=28 from write)
----- amqxfdcx.c : 783 --------------------------------------------------------
06/08/2018 03:36:46 AM - Process(8832.5) User(mqm) Program(amqzlaa0)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager XYZ.
AMQ6119: An internal WebSphere MQ error has occurred ('28 - No space left on device' from semget.)
----- amqxfdcx.c : 783 --------------------------------------------------------
06/14/2018 02:35:46 PM - Process(6794.1) User(mqm) Program(amqzxma0)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager XYZ.
----- amqxfdcx.c : 822 --------------------------------------------------------
06/14/2018 02:35:46 PM - Process(6794.1) User(mqm) Program(amqzxma0)
AMQ6118: An internal WebSphere MQ error has occurred (20006037)

When trying to connect with the IBM WebSphere MQ Explorer

Queue manager not available for connection - reason 2059. (AMQ4043)
Severity: 20 (Error)
Explanation: The attempt to connect to the queue manager failed. This could be because the queue manager is incorrectly configured to allow a connection from this system, or the connection has been broken.
Response: Ensure that the queue manager is running. If the queue manager is running on another computer, ensure it is configured to accept remote connections.

Is there a way of clearing all messages from the queues and resetting all flags so the queue manager will start and the queues will work again?

There are only old test data in the queues, nothing of value.

Or do you have any other suggestions on how to fix this?

JoshMc · Accepted Answer

You can use the mqrc command to provide more information on errors. Most of the time MQ reports return codes as a four digit decimal number. In this case since the return code is three digits it usually (always?) means it is a HEX return code.

$ mqrc 2195

      2195  0x00000893  MQRC_UNEXPECTED_ERROR

This error is thrown when MQ hits an error condition that was not expected. Usually you will find a FDC file was created in the /var/mqm/errors directory that could provide some more detail.

The best course of action when you receive this type of error is to open a PMR with IBM and have them provide direction on recovery to ensure you have the best chance of preserving messages that may be present on your queues, however you are using a version of MQ (7.0) that has been out of support since September 30th 2015. The specific Fix Pack you are on (7.0.1.3) was released in August 2010. The last release of v7.0 from IBM was 7.0.1.14 in August 2016.

If you pay IBM for extended support you may be able to open a PMR with them for futher support.

The best path forward once you have resolved your issue would be to migrate to a supported version of IBM MQ. Currently v8.0 and v9.0 are the only supported versions of IBM MQ at this time.

Assuming you do not have extended support and are unable to get assistance from IBM, the following are some suggested steps:

Updating even to the latest Fix Pack (7.0.1.14) may help, and if it does not solve the problem it is still better by be at the latest Fix Pack of a unsupported version of IBM MQ.
You could try to cold start your queue manager and see if that helps. This is documented starting on Page 4 of the presentation "WebSphere MQ Disaster Recovery" given by Mark Taylor at Capitalware's MQ Technical Conference v2.0.1.3.

Create a queue manager EXACTLY like the one that failed
Use qm.ini to work out parameters to crtmqm command
Log:
  LogPrimaryFiles=10
  LogSecondaryFiles=10
  LogFilePages=65535
  LogType=CIRCULAR
Issue the crtmqm command

crtmqm -lc -lf 65535 -lp 10 -ls 10 –ld /tmp/mqlogs TEMP.QMGR

Make sure there is enough space for the new log files in that directory

Name of the dummy queue manager is irrelevant

Only care about getting the log files

Don’t start this dummy queue manager, just create it
Replace old logs and amqhlctl.lfh with the new ones
cd /var/mqm/log
mv QM1 QM1.SAVE
mv /tmp/mqlogs/TEMP!QMGR QM1
Note the “mangled” directory name … this is normal
Data in the queues is preserved if messages are persistent

Object definitions are also preserved

Objects contain their own definitions in their files

Mapping between files and object names held in QMQMOBJCAT

Once all the above is complete then try and start your queue manager.

Disk full made MQ dead

Answers (1)

Related Questions