ethanxyz_0
ethanxyz_0

Reputation: 743

XNIO WorkerThread (default I/O-x) with 100% CPU usage

We are changing our cluster topology and facing a 100% CPU issue in the "default I/O-x" thread (org.xnio.nio.WorkerThread).

Looks like an infinity loop in the ConduitStreamSinkChannel.write(..) but I'm not sure about it.

The problem is intermittent and not simple to reproduce, and it needs a restart to stop.

Every time that it enters this state, the thread dump is the same:

"default I/O-2" #103 prio=5 os_prio=0 tid=0x00007fd0f83a4800 nid=0x6941 runnable [0x00007fd0e9be0000]
   java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
- locked <0x00000007a2be6ec0> (a java.nio.channels.ClosedChannelException)
at java.lang.Throwable.(Throwable.java:250)
at java.lang.Exception.(Exception.java:54)
at java.io.IOException.(IOException.java:47)
at java.nio.channels.ClosedChannelException.(ClosedChannelException.java:52)
at org.xnio.ssl.JsseStreamConduit.write(JsseStreamConduit.java:1022)
at org.xnio.conduits.ConduitStreamSinkChannel.write(ConduitStreamSinkChannel.java:150)
at org.xnio.http.HttpUpgrade$HttpUpgradeState$StringWriteListener.handleEvent(HttpUpgrade.java:385)
at org.xnio.http.HttpUpgrade$HttpUpgradeState$StringWriteListener.handleEvent(HttpUpgrade.java:372)
at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.xnio.conduits.WriteReadyHandler$ChannelListenerHandler.writeReady(WriteReadyHandler.java:65)
at org.xnio.ssl.JsseStreamConduit.run(JsseStreamConduit.java:393)
at org.xnio.ssl.JsseStreamConduit.writeReady(JsseStreamConduit.java:524)
at org.xnio.ssl.JsseStreamConduit$1.writeReady(JsseStreamConduit.java:287)
at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:94)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:567)

The change in the topology is adding a few more servers groups and changes the reverse proxy (nginx) in order to redirect (status 301) every http connection to https.

Analyzing the thread dump, seems to be something related to https (package org.xnio.ssl.*), but everything behind the nginx is plain http.

Our application provides a Websocket endpoint (WSS through nginx) and makes a lot of EJB remote invocations to others servers (http-remoting, directly to the other server without passing through nginx).

Also, the application make/receive some REST calls (RESTEasy) to others servers in the cluster through the load balancer (nginx https, in order to avoid the 301 redirection)

What could be causing this?

thanks!

Upvotes: 1

Views: 1041

Answers (1)

Awan Biru
Awan Biru

Reputation: 484

I hope it is not too late for me to provide an answer.

We had the same problem as the question above. The EAP(Jboss) version is 7.0.x, and the EAP instances processed the HTTPS requests from several NGNIX that were set up as reverse proxies.

At a certain traffic profile, we noticed that the affected EAP instances consumed 99% of the CPU, and the web application within it became inaccessible.

It was as if the EAP instances were frozen. We captured the log, and the below default I/O thread consumed the most CPUs when cross-checked with the thread dump.

12:56:10,512 INFO  [stdout] (default I/O-11) io.undertow.server.protocol.http.AlpnOpenListener$AlpnConnectionListener.handleEvent(AlpnOpenListener.java:356)
12:56:10,512 INFO  [stdout] (default I/O-11) io.undertow.server.protocol.http.AlpnOpenListener$AlpnConnectionListener.handleEvent(AlpnOpenListener.java:341)
12:56:10,512 INFO  [stdout] (default I/O-11) org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
12:56:10,512 INFO  [stdout] (default I/O-11) org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
12:56:10,512 INFO  [stdout] (default I/O-11) io.undertow.protocols.ssl.SslConduit$SslReadReadyHandler.readReady(SslConduit.java:1291)
12:56:10,512 INFO  [stdout] (default I/O-11) org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
12:56:10,512 INFO  [stdout] (default I/O-11) org.xnio.nio.WorkerThread.run(WorkerThread.java:591)
10:14:29,798 INFO  [stdout] (default I/O-21) io.undertow.protocols.ssl.SslConduit.doUnwrap(SslConduit.java:850)
10:14:29,798 INFO  [stdout] (default I/O-21) io.undertow.protocols.ssl.SslConduit.read(SslConduit.java:587)
10:14:29,798 INFO  [stdout] (default I/O-21) org.xnio.conduits.AbstractStreamSourceConduit.read(AbstractStreamSourceConduit.java:51)

The problem was caused by the default I/O threads' inability to obtain a new thread from the pool due to the thread pool exhaustion. This behavior was captured by the RedHat KB link below:-

https://access.redhat.com/solutions/7031598

Once we perform the rate limit and distribute loads to more EAP instances, the high CPU problem is resolved. Hope this helps

Upvotes: 0

Related Questions