Reputation: 1044
I have been using HikariCP on my spring boot application and I am starting to make some load tests with JMeter.
I noticed that the first time I run my tests, it goes well, and each request takes like 30ms or so.
But each time I run my tests again, against the same application instance, the response time gets worse, until it freezes and I get a whole lot of
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30019ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:583)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112)
at sun.reflect.GeneratedMethodAccessor501.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at net.bull.javamelody.JdbcWrapper$3.invoke(JdbcWrapper.java:805)
at net.bull.javamelody.JdbcWrapper$DelegatingInvocationHandler.invoke(JdbcWrapper.java:286)
at com.sun.proxy.$Proxy102.getConnection(Unknown Source)
at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:246)
... 108 common frames omitted
I even left the application iddle for a day and tryied again, but the tests show degraded performance and the same errors.
Only if I shutdown the application it can run my tests again, but only one load (1200+ requests).
When I was developing the tests I was running my local app with a H2 database and didn't notice any degradation until I deployed my application on a server running postgresql.
So to take that variable out of the way I left JMeter running on my local H2 app and the degradation showed.
Here is a test scenario I ran on my local app (H2 database), with default HikariCP poll size (10), using 10 threads. I manage to run 25000+ requests before the application stopped responding.
I plotted the requests:
Also, the tests consists of a request to a Spring Boot @RestController
.
My controller calls a service that has @Transactional
at the start (I call some legacy APIs that require a transaction to exist, so I open it right away).
So let's say I have my tests requesting this endpoint 10 times in parallel. Let's also say that my code might have other points annotated with @Transactional
. Would a poll size of 10 be enough?
Also, should any poll size be enough, despite having poor performance, or is it "normal" to have this kind of scenario where the poll just get's too busy and "locks"?
I also tried increasing the poll size to 50 but the problem persists. It gets close to the previous 25000 requests from the previous tests (with 10 poll size) and fails like stated before.
Upvotes: 4
Views: 4198
Reputation: 1044
So, it was a memory leak after all. Nothing to do with HikariCP.
We have some Groovy scripts using @Memoized with some really bad cache keys (huge objects), and that cache kept getting bigger until there was no memory left.
Upvotes: 2
Reputation: 44962
HikariCP suggests to use a constant-size small pool saturated with threads waiting for connections. As per the docs the suggested pool size:
connections = ((core_count * 2) + effective_spindle_count)
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count). Core count should not include HT threads, even if hyperthreading is enabled. Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls. ... There hasn't been any analysis so far regarding how well the formula works with SSDs.
An in-memory H2 with a small dataset will be faster than a standalone database running on a different server. Even if you are running in the same datacenter the round-trip between servers is usually around 0.5-1ms.
Try to find the current bottleneck first. If the application server doesn't run out of CPU then the problem is somewhere else e.g. database server. If you can't figure out where is the current bottleneck you may end up optimising in the wrong place.
Upvotes: 4