Sats
Sats

Reputation: 115

Websphere server that may be hung

I am getting the below error.... Kindly help

[8/5/14 21:06:54:277 GMT-08:00] 00000091 DiscoveryTx   W   DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\nodeagent connection  was closed. Member will  be removed from view. DCS connection status is Discovery|Ptp, transmitter closed.
[8/5/14 21:07:23:562 GMT-08:00] 00000010 MbuRmmAdapter W   DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_PYMTCAPTURE_CL02 connection  was closed. Member will  be removed from view. DCS connection status is View|Gossip, this member is suspected by the other member.
[8/5/14 21:08:00:079 GMT-08:00] 00000091 DiscoveryTx   W   DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_DOWNSTREAM_CL02 connection  was closed. Member will  be removed from view. DCS connection status is Discovery|Ptp, transmitter closed.
[8/5/14 21:08:16:296 GMT-08:00] 00000010 RmmPtpGroup   W   DCSV1112W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_DOWNSTREAM_CL02 failed to respond to periodic heartbeats. Member will be removed from view. Configured Timeout is 180000 milliseconds. DCS logical channel is View|Ptp.
[8/5/14 21:08:29:236 GMT-08:00] 00000091 DiscoveryTx   W   DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_DMGR_Node\dmgr connection  was closed. Member will  be removed from view. DCS connection status is Discovery|Ptp, transmitter closed.
[8/5/14 21:10:20:892 GMT-08:00] 00000018 ApplicationMo W   DCSV0004W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Did not receive adequate CPU time slice. Last known CPU usage time at 21:03:08:272 GMT-08:00. Inactivity duration was 402 seconds. 
[8/5/14 21:11:14:131 GMT-08:00] 00000043 ThreadMonitor W   WSVR0605W: Thread "WMQJCAResourceAdapter : 5" (00000067) has been active for 657039 milliseconds and may be hung.  There is/are 2 thread(s) in total in the server that may be hung.
    at com.ibm.ejs.ras.TraceLogger.doLog(TraceLogger.java:332)
    at com.ibm.ejs.ras.TraceLogger.processEvent(TraceLogger.java:319)
    at com.ibm.ws.logging.WsHandlerWrapper.publish(WsHandlerWrapper.java:43)
    at java.util.logging.Logger.log(Logger.java:1121)
    at com.ibm.ejs.ras.Tr.logToJSR47Logger(Tr.java:1681)
    at com.ibm.ejs.ras.Tr.fireEvent(Tr.java:1643)
    at com.ibm.ejs.ras.Tr.fireTraceEvent(Tr.java:1565)
    at com.ibm.ejs.ras.Tr.entry(Tr.java:816)
    at com.ibm.ws.sib.utils.ras.SibTr.entry(SibTr.java:912)
    at com.ibm.ws.wmqcsi.trace.TraceImpl.methodExit(TraceImpl.java:349)
    at com.ibm.msg.client.commonservices.trace.Trace.methodExitInternal(Trace.java:715)
    at com.ibm.msg.client.commonservices.trace.Trace.exit(Trace.java:628)
    at com.ibm.msg.client.wmq.v6.jms.internal.JMSMessage._setJMSXObjectProperty(JMSMessage.java:3928)
    at com.ibm.msg.client.wmq.v6.jms.internal.MQJMSMessage.write(MQJMSMessage.java:1223)
    at com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.sendInternal(MQMessageProducer.java:1139)
    at com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:768)
    at com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:2713)
    at com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.sendMessage(JmsMessageProducerImpl.java:872)
    at com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send_(JmsMessageProducerImpl.java:727)
    at com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send(JmsMessageProducerImpl.java:398)
    at com.ibm.mq.jms.MQMessageProducer.send(MQMessageProducer.java:281)
    at com.ibm.ejs.jms.JMSQueueSenderHandle.send(JMSQueueSenderHandle.java:204)

Upvotes: 1

Views: 9637

Answers (4)

Jack job
Jack job

Reputation: 1

Do the following steps: - Ensure that Deployment manager is up and running - verify that app server and node agent are stopped - no java processes related to node agent and app server running - go to NODE_PROFILE\bin (not deployment manager profile) - run syncNode.sh/bat - run startNode.sh/bat - if node agents starts successfully you should be able to start server from command line or web console

Upvotes: 0

Rachana K
Rachana K

Reputation: 104

This is the general error that might be encountered during server start phase.

Basic idea behind this is that, when you start the server, threads are getting initialized for your process/job that you want to run on server. That thread is waiting for few resources which helps them to run the process/job. But at that point of time thread may get hung, because of un-availablity of resources.

One way to fix it - Kill the process from background because of which that thread is hung. Again start the server.

Upvotes: 0

whitfiea
whitfiea

Reputation: 1943

The log entry starting with

ThreadMonitor W WSVR0605W: Thread "WMQJCAResourceAdapter : 5" (00000067) has been 
active for 657039 milliseconds and may be hung.

indicates that this thread has been active for that period of time BUT the thread stack it generates is just the thread at the point in time that the log entry is generated. This means it could have been stuck for 90% of the time in one point in the code and the stack trace generated is just where it is now.

What that particular thread is doing at that point is appending an entry into the trace logs when the application is attempting to send an MQ JMS message. So there is no indication that that thread is hung at that point.

A couple of things to try:

  1. Investigate the CPU usage as the CPU starvation messages indicate that is a problem.
  2. Search the SystemOut.log for corresponding messages saying threads are no longer hung.
  3. Take a javacore to see the threads at 2 minute intervals to see what threads are moving.
  4. Turn off trace unless you need it.

Upvotes: 1

Brian S Paskin
Brian S Paskin

Reputation: 181

you are receiving CPU Starvation errors. This could be because you are thrashing the garbage collector, your heap is not big enough or something else is taking up the CPU time. You need to find the process or processes that are taking up the CPU and examine why they are running high.

Regards, Brian

Upvotes: 2

Related Questions