Reputation: 4317
I have deployed Spring Boot
application that has a Database based queue with jobs on App Service
.
Yesterday I performed a few Scale out
and Scale in
operations while the application was working to see how it will behave.
At some point (not necessary related to scaling operations) application started to throw Hikari
errors.
com.zaxxer.hikari.pool.PoolBase : HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@1ae66f34 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
com.zaxxer.hikari.pool.ProxyConnection : HikariPool-1 - Connection org.postgresql.jdbc.PgConnection@1ef85079 marked as broken because of SQLSTATE(08006), ErrorCode(0)
The following are stack traces from my scheduled job in spring and other information:
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
Caused by: javax.net.ssl.SSLException: Connection reset by peer (Write failed)
Suppressed: java.net.SocketException: Broken pipe (Write failed)
Caused by: java.net.SocketException: Connection reset by peer (Write failed)
Next the following stack of errors:
WARN 1 --- [ scheduling-1] com.zaxxer.hikari.pool.PoolBase : HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@48d0d6da (This connection has been closed.).
Possibly consider using a shorter maxLifetime value.
org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLException: Connection is closed
Caused by: java.sql.SQLException: Connection is closed
The code which is invoked periodically - every 500 milliseconds is here:
@Scheduled(fixedDelayString = "${worker.delay}")
@Transactional
public void execute() {
jobManager.next(jobClass).ifPresent(this::handleJob);
}
Update. The above code is almost all the time doing nothing, since there was no traffic on the website.
Update2. I've checked Postgres
logs and found this:
2020-07-11 22:48:09 UTC-5f0866f0.f0-LOG: checkpoint starting: immediate force wait
2020-07-11 22:48:10 UTC-5f0866f0.f0-LOG: checkpoint complete (240): wrote 30 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.046 s, sync=0.046 s, total=0.437 s; sync files=13, longest=0.009 s, average=0.003 s; distance=163 kB, estimate=13180 kB
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG: received immediate shutdown request
2020-07-11 22:48:10 UTC-5f0a3f41.8914-WARNING: terminating connection because of crash of another server process
2020-07-11 22:48:10 UTC-5f0a3f41.8914-DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
// Same text about 10 times
2020-07-11 22:48:10 UTC-5f0866f2.7c-HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG: src/port/kill.c(84): Process (272) exited OOB of pgkill.
2020-07-11 22:48:10 UTC-5f0866f1.fc-WARNING: terminating connection because of crash of another server process
2020-07-11 22:48:10 UTC-5f0866f1.fc-DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-07-11 22:48:10 UTC-5f0866f1.fc-HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG: archiver process (PID 256) exited with exit code 1
2020-07-11 22:48:11 UTC-5f0866ee.68-LOG: database system is shut down
It looks like it is a problem with Azure PostgresSQL
server and it closed itself. Am I reading this right?
Upvotes: 2
Views: 2362
Reputation: 2030
Like mentioned in your logs, have you tried setting maxLifetime property for the Hikari CP ? I think after setting that property this issue should be resolved.
Based on Hikari doc (https://github.com/brettwooldridge/HikariCP) --
maxLifetime
This property controls the maximum lifetime of a connection in the pool. An in-use connection will never be retired, only when it is closed will it then be removed. On a connection-by-connection basis, minor negative attenuation is applied to avoid mass-extinction in the pool. We strongly recommend setting this value, and it should be several seconds shorter than any database or infrastructure imposed connection time limit. A value of 0 indicates no maximum lifetime (infinite lifetime), subject of course to the idleTimeout setting. The minimum allowed value is 30000ms (30 seconds). Default: 1800000 (30 minutes)
Upvotes: 1