LewlSauce
LewlSauce

Reputation: 5872

Sidekiq, Redis, and Postgresql connection pools. What should my settings look like given my scenario?

For quite some time, I have been receiving the connection pool timeout and 2020-06-17T16:15:33.701Z pid=1132347 tid=goeqpzc8b WARN: PG::ConnectionBad: FATAL: sorry, too many clients already errors while using sidekiq workers and postgres. I have tried to understand how this all simply works, but simply cannot seem to get away from these errors no matter what I adjust my settings to.

Right now, my config/database.yml file uses these settings as its options:

default: &default
  adapter: postgresql
  encoding: unicode
  pool: 20
  port: 25060

In my config/sidekiq.yml file, I also have the following options set:

development:
  :concurrency: 20
production:
  :concurrency: 20
:queues:
  - default

and according to my database settings from Digital Ocean, I have a backend server connection limit of 22.

With that being said, why do the sidekiq jobs fail instead of just queueing up and waiting to run? Each of my sidekiq workers have code in it to CRUD entries in ActiveRecord, but I'm not sure how much of this matters in the actual sidekiq workers.

To my understanding, my limit with the managed database is 22 connections tops, but my settings in the database.yml and sidekiq.yml files are lower than the limit, so I'm not sure how or why it still fails rather than just remaining in the queue until there's room.

Given this scenario, are there suggested pool and concurrency settings that I should be using? Some of the commands in the sidekiq workers actually monitor changes in a particular record to see when it's updated, so it continuously checks the record and then proceeds once the expected information is in there.

I am looking for any suggestions or clarification on this.

Upvotes: 0

Views: 638

Answers (1)

jjanes
jjanes

Reputation: 44137

PG::ConnectionBad: FATAL: sorry, too many clients already

The gap between 20 and 22 doesn't give you much margin of safety. Having pgAdmin4 open and then having some monitoring connection start up, for example (both bypassing the connection pool) would put you over the limit if the pool is trying to be fully used. If you can't increase max_connections, try reducing the pool size by a couple to give you more safety. It could make the other problem worse, but going from 2 problems to 1 problem is a type of progress.

With that being said, why do the sidekiq jobs fail instead of just queueing up and waiting to run?

I'm not a sidekiq user, but apparently they are only willing to wait for so long, and that period is being exceeded. Or at least, "connection pool timeout" is not a message generated by PostgreSQL, so it is coming from somewhere else; and based on the wording the pool manager is the obvious candidate.

Upvotes: 2

Related Questions