Reputation: 67484
We've been experiencing a prod issue that requires restarting our tomcat servers to fix. We can reproduce this with a performance test, but we're not exactly sure the series of events required to reproduce it. It just seems if we throw enough traffic at a tomcat instance, it eventually has the following problem:
Tomcat has 200 threads available, but all 200 of them are doing this:
"http-nio-8080-exec-1" #41 daemon prio=5 os_prio=0 tid=0x00007f7264a5b800 nid=0xdf9 runnable [0x00007f71e39fd000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:101)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:144)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:174)
- locked <0x000000008b9bbb60> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3008)
at com.mysql.jdbc.MysqlIO.readPacket(MysqlIO.java:567)
at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1016)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2188)
at com.mysql.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:2035)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2019)
- locked <0x000000008b9bbc88> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:776)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.GeneratedConstructorAccessor69.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:386)
at com.mysql.jdbc.MultiHostConnectionProxy.createConnectionForHost(MultiHostConnectionProxy.java:346)
- eliminated <0x000000008b9bc188> (a com.mysql.jdbc.LoadBalancedConnectionProxy)
at com.mysql.jdbc.LoadBalancedConnectionProxy.createConnectionForHost(LoadBalancedConnectionProxy.java:372)
- eliminated <0x000000008b9bc188> (a com.mysql.jdbc.LoadBalancedConnectionProxy)
at com.mysql.jdbc.RandomBalanceStrategy.pickConnection(RandomBalanceStrategy.java:73)
at com.mysql.jdbc.LoadBalancedConnectionProxy.pickNewConnection(LoadBalancedConnectionProxy.java:317)
- locked <0x000000008b9bc188> (a com.mysql.jdbc.LoadBalancedConnectionProxy)
at com.mysql.jdbc.LoadBalancedConnectionProxy.<init>(LoadBalancedConnectionProxy.java:229)
at com.mysql.jdbc.LoadBalancedConnectionProxy.createProxyInstance(LoadBalancedConnectionProxy.java:105)
at com.mysql.jdbc.NonRegisteringDriver.connectLoadBalanced(NonRegisteringDriver.java:374)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:314)
at com.mysql.jdbc.ReplicationConnectionProxy.initializeMasterConnection(ReplicationConnectionProxy.java:415)
at com.mysql.jdbc.ReplicationConnectionProxy.<init>(ReplicationConnectionProxy.java:182)
at com.mysql.jdbc.ReplicationConnectionProxy.createProxyInstance(ReplicationConnectionProxy.java:85)
at com.mysql.jdbc.NonRegisteringDriver.connectReplicationConnection(NonRegisteringDriver.java:459)
at com.mysql.jdbc.NonRegisteringReplicationDriver.connect(NonRegisteringReplicationDriver.java:46)
at com.myapp.rest.JSONService.setUpConnection(JSONService.java:1278) ******************************************************************************
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$127/439202272.invoke(Unknown Source)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:243)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268)
at org.glassfish.jersey.internal.Errors.process(Errors.java:316)
at org.glassfish.jersey.internal.Errors.process(Errors.java:298)
at org.glassfish.jersey.internal.Errors.process(Errors.java:268)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:676)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1132)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1533)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1489)
- locked <0x000000008a974068> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
I know that's a big wall of text, but you can find our code by searching for ******************************************************************************
in that stack trace.
Since all available threads are waiting to grab a connection, tomcat becomes unresponsive. When the tomcat gets locked up like this, there are no connections on the mysql master (although I haven't checked the slaves). So it appears like the Java code is waiting for a connection from the database and the database has no record of Java trying to connect to it.
This database has 7k available connections and through New Relic we can see that there are never more than 3k made. So it's not an issue of the DB being out of connections.
Here is the code used to make the connection:
Class.forName("com.mysql.jdbc.ReplicationDriver");
final String JDBC_CONNECTION_STRING = System.getProperty("JDBC_CONNECTION_STRING");
final String DB_USER_STRING = System.getProperty("DB_USER_STRING");
final String DB_PASSWORD_STRING = System.getProperty("DB_PASSWORD_STRING");
ReplicationDriver driver = new ReplicationDriver();
Properties props = new Properties();
// We want this for failover on the slaves
props.put("autoReconnect", "true");
// We want to load balance between the slaves
props.put("roundRobinLoadBalance", "true");
props.put("user", DB_USER_STRING);
props.put("password", DB_PASSWORD_STRING);
props.put("useLocalSessionState", "true");
props.put("useLocalTransactionState", "true");
props.put("connectTimeout", System.getProperty("MYSQL_CLIENT_CONNECT_TIMEOUT_MS"));
props.put("socketTimeout", System.getProperty("MYSQL_CLIENT_SOCKET_TIMEOUT_MS"));
if(use_utf8)
{
props.put("useUnicode", "yes");
props.put("characterEncoding", "UTF-8");
}
connect = driver.connect(JDBC_CONNECTION_STRING, props); //all threads block on this line
connect.setReadOnly(false);
As a workaround, we added the lines for MYSQL_CLIENT_CONNECT_TIMEOUT_MS and MYSQL_CLIENT_SOCKET_TIMEOUT_MS. Setting these prevents the machines from getting locked up, but we haven't really solved the underlying issue. I'd like to fix the underlying issue: Why is the Java code waiting for connections to the database?
We are using MySQL on RDS, version 5.6.10.
I'm using Tomcat 8, and JDK 8, Jersey 2.26, and
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.45</version>
</dependency>
It's probably worth noting I'm connecting to this database with this jdbc url prefix: jdbc:mysql:replication://
. Also, this only seems to happen on RDS, not locally. A difference between the two environments is RDS has masters and slaves, locally it does not.
Upvotes: 0
Views: 1378
Reputation: 308998
If you're using a database connection pool, it would suggest that the pool has been exhausted, probably because connections are not being closed and returned to the pool properly.
It's not clear from your post how you've designed your application.
Upvotes: 2