port5432
port5432

Reputation: 6391

How to manage a pool of background servers in Rails

Our Rails application has some very intensive background processes, sometimes taking several hours to run. We are using delayed_job, and would consider moving to Resque or the free version of Sidekiq, it made sense in this context of this question.

We are hitting 100% cpu on all processors for some of the jobs, and currently the background processors are on the same physical server as Nginx, Rails and Postgres. We are also expecting the load to rise.

We would like to move the background processing off to a pool of commodity-level batch processing VMs, and preferably spin them up as needed. The way I am thinking is to extract the perform code into mini-apps and put them onto the batch processing VMs.

What I am not sure about is how to code this, also how to load-balance the job queues across different VMs. Is this something that delayed_job/Reqsue/Sidekiq can do, or do I need to code it?

EDIT

Some useful links I have found on this topic

http://www.slideshare.net/kigster/12step-program-for-scaling-web-applications-on-postgresql

Use multiple Redis servers in Sidekiq

https://stackoverflow.com/a/19540427/993592

Upvotes: 3

Views: 891

Answers (2)

Eric
Eric

Reputation: 11

I've seen Sidekiq workers hang during network operations, eventually stopping all jobs from running, with no way of knowing until users complain.

ConeyIsland offers more control over job execution than Sidekiq does and also uses RabbitMQ for a message bus, which is more robust and has far superior scaling features to Redis.

You can set a per-queue and per-job timeouts, configure retry behavior, and a bad job will never cause the worker to hang: it will always continue working other jobs.

Exceptions in jobs are pushed to the notification service of your choice, so you will know when a job goes bad.

http://edraut.github.io/coney_island/

Upvotes: 1

Philip Hallstrom
Philip Hallstrom

Reputation: 19889

My personal preference is Sidekiq. I'd be a little concerned about "several hour" jobs and what happens if they fail in the middle. By default Sidekiq will try and re-run them. You can change that, but you definitely want to think through the the scenario. This of course will be true for whatever background job processing system you use though. IMHO I'd try to find a way to break those big jobs up into smaller jobs. Even if it's just "job part 1 runs then enqueues job part 2, etc".

As for scalability Sidekiq's only real limit is Redis. See here for some options on that: https://github.com/mperham/sidekiq/wiki/Sharding

As for load balancing, Sidekiq does it by default. I run two sidekiq servers now that pull from a single Redis instance. 25 workers on each with about 12 queues. Works amazingly well.

Upvotes: 2

Related Questions