Reputation: 6896
We have a pretty complex Rails setup on Heroku.
In a typical day we have about 10 Web Dynos with Unicorn (each 2x and running 3 Unicorn workers) and 15 Worker Dynos running Delayed Jobs, though it fluctuates, so we use hirefire to scale up and down when we can to save on cost. Our Postgres database allows for 400 connections.
Last week I finally got fed up with our Delayed::Jobs queue that we had been using for several years; we have a series of jobs that were running every 10 minutes, it got to the point where it was taking more than 10 minutes to run all the jobs, so they queue would get backed up. I decided to make the decision to move over to Sidekiq, as I had had some success with it in the past.
It is working decently well so far, though I am finding our web dynos to be way less consistent. For example, here is our new relic graph of a 3 hour period yesterday:
But here's what the exact same time period looked like the week before:
Basically, before Sidekiq, our jobs didn't seem to be affecting our web dynos at all, but now they are. My only guess here is that when our every 10 minute jobs run they are temporarily overwhelming our postgres connections, which is slowing down the web dynos. It's the only way I can imagine the jobs would effect the web.
Any thoughts on how to keep these a bit more separate, or so they are affecting each other less, and our web response time more consistent?
Here's our sidekiq.yml:
---
:concurrency: 5
production:
:concurrency: <%= ENV['WORKER_POOL'] || 15 %>
:queues:
- [instant, 3]
- [fetchers, 2]
- [mailers, 1]
- [fetch_all, 1]
- [moderation, 1]
- [default, 1]
- [reports, 1]
- [images, 1]
- [slack, 1]
And our sidekiq.rb
require 'sidekiq'
Sidekiq.configure_server do |config|
database_url = ENV['DATABASE_URL']
if database_url
pool = ENV['WORKER_POOL'] || 15
new_database_url = "#{database_url}?pool=#{pool}"
ActiveRecord::Base.establish_connection(new_database_url)
end
end
Sidekiq.default_worker_options = { retry: 1 }
We are overwriting the pool setting on the db for the sidekiq worker instances so we can take full advantage of the concurrency.
And our database.yml
production:
database: myapp_production
adapter: postgresql
encoding: unicode
pool: 5
And our unicorn.rb
worker_processes 3
timeout 30
preload_app true
listen ENV['PORT'], backlog: Integer(ENV['UNICORN_BACKLOG'] || 200)
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.connection.disconnect!
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT'
end
defined?(ActiveRecord::Base) and
ActiveRecord::Base.establish_connection
end
And our Procfile:
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
redis: redis-server
worker: bundle exec sidekiq -e production -C config/sidekiq.yml
Our hirefire managers are set up like
Web:
Workers:
Any suggestions?
Upvotes: 0
Views: 194
Reputation: 6896
It remains to be seen, but a very promising fix seems to have been turning off prepared_statements for the sidekiq workers in my config/database.yml
:
default: &default
adapter: postgresql
encoding: unicode
pool: 5
prepared_statements: <%= !Sidekiq.server? %>
Upvotes: 1